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ABSTRACT 


The multi-backend database system (MBDS) in the 
Laboratory for Database System Research at the Naval 
Postgraduate School is designed to overcome the 


performance-gain and capacity-growth problems of either the 
traditional database system or the single-backend-software 
database system. The original MBDS supported four primary 
operations, namely, RETRIEVE, DELETE, UPDATE and INSERT. 

This thesis presents the design and implementation of 
the fifth primary operation, the RETRIEVE-CONMON operation. 
Ihe retrieve-common operation is used to merge two files by 
their common attribute values. First, the overall design 
and implementation of MBDS is reviewed. Then, several 
alternatives are compared and analyzed to select the best 
one as our design and implementation approach. Finally, we 
describe the detailed design and the implementation. Our 
goal is to maximize the utilization and minimize the erfects 
to the existing system. 

Por integrating our design inco MBDS, several 
modifications are made. The algorithms £ Or the 
modifications and their program specifications are also 


provided in Chapter IV, Y and Appendices. 
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I. INTRODUCTION 


A. THE SCOPE OF THE THESIS 


A database, is a collection of stored operational lata; 
and a database system is a computer-based system whose 
overall purpose is tc record and maintain information (data) 
[Ref. 1]. The traditional approach to manage the database 
System is to run the database system software as an 
application program ina mainframe computer systen. The 
database system must share the use and the control of the 
mainframe computer resources with all of the other 
applications of the computer system. The performance of 
this approacn suffers whenever there is an increase from 
either the usage of the computer system or the database 
applications. 

One solution to this problem is to offload the database 
system from the mainframe to a single, dedicated backend 
computer. The backend computer has its own disk storage and 
used to perform database operations exclusively. 
[Refs. 2,3]. This approach is known as the single software 
tackend approach. Latabase systems based on this approach 
are referred to as software single backend database systens. 
However, this approach still has the disadvantage,  tnat is, 
performance upgrades will require the replacement of the 
tackend and this may entail software modifications and 
hardware disruption [Ref. 8 : p. 4]. 

A second approach to solve the database performance 
problem is to develop a special-purpose database machine 
with specially designed hardware. However, the 
Cost-effectiveness of this approach, known as the hardware 


tackend approach, has not yet been demonstrated (Ref. 5]. 


In order to overcome the perrorügance-gain and 
capacity-growth problems of eitter the traditional database 
system or the single Facendo e systen, cam Bec egre 
a multi-tackend datalase system, known as MBDS, is conducted 
in the Laboratory for Database Systems Research, at the 
Naval Pcstgraduate School. Instead of a single backend 
computer, MBDS uses several identical (both in hardware and 
In soLeraLe) miniccaputers as its backend computers in a 
parallel fashion in order to gain perforuance gain and 
Capacity  JEOwen. These backends with their respective disk 
systems are connected with another minicouputer, called the 
Fackendescentro] cr: The controller is responsible. for 
supervising the execution of parallel database operations on 
the kackends and ‘for interfaciny with the hosts and the 
user. Users access the system either by way of the host or 


through the controller directly (as shown in Fiyure 1.1). 


Disk 
Controller 
ini 


Disk 
Controller 
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Host Transaction : 
Applications Operating 
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Answer Disk 
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Figure 1.1 The Multi-Backend Database Systen DS 
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The attribute-based data language (ABDL) [Ref. 6] is 
used as the basis of the data language of MBDS. Currently, 
ABDL supports four primary database operations,  RETRIEVE, 
DELETE, UPDATE and INSERT. The functions of these four 


database operations are shown in Figure 1.2. 


TS O A A AA co "ene e... O a» A AP ee ee ee ee ee ce uH Am ee ee ee GP AMEND AUD uH GANE E 


] | 

| Operation | Function | 
| RETRIEVE | Retrieve records from tne database | 

| 
| DELETE Lelete records from the database | 
| UPIATE | Modify records of the database | 
| l | 
| INSERT | Insert records into the database | 


=—- «» «» «» uA CANED O A A O O O O A O A a -m-. i O O O A O A O rr. vv; {| vp |: —P— a A A A O O O ee ee a — -— 


Figure 1.2 The Functions of the 


Current *BDS Database Operations. 


In order to make MBDS a more complete database systen, 
the fifth operation, the RETRIZVE-COMMON operation which is 
used to merge two files by common attribute values, has been 
proposed [Ref. 7]. This thesis will focus on the design and 
implementation of the RETRIEVE-COMMON operation of MBDS. Ye 
will propose several alternatives of the design and 
implementation strategies, then evaluate and analyze these 
alternatives based on the time complexities, the affects to 
the existing system and the design-goals of MBDS. According 
the results of the analysis, we will choose the best 


alternative to design and implement the fifth operation. 
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B. THE ORGANIZATION OF THE THESIS 


The rest of this thesis is organized as follows. In 
chapter II we give an overview of the architecture of the 
MBDS. We will describe the design goals, the underlying and 
intended hardware, tke process structure, the data model and 
the data language of MBDS. In chapter TE we first 
define the intended operation and the syntax of 
RETRIEVE_COMMON operation, and then evaluate and analyze the 
alternatives for the design and implementation. According 
to the analysis, we will select the best alternative to add 
the retrieve-common creration into the MBDS. In chapter IV, 
we present the detazis of the design for the selected 
approach. We also consider the possible effects of this 


approach to the existing system. In chapter V, we describe 


how to incorporate our design into MBDS. Our goal is to 
minimize the effects cf the implementation. Finally, eee Das 
thesis is summarized and concluded in chapter VI. It is 


hoped that this thesis will provide a definite help to the 


future werk on MBDS. 
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II. THE MULTI-BACKEND DATABASE SIS 

In this chapter we will briefly review the configuration 
and the theory of operations of the MS. Most of the 
information provided in this chapter has been extracted from 
[Refs. 4,7: pp. 31-68, 7-20]. The interested readers are 
encouraged to refer to the references. 


A. THE SYSTEM GOALS 


As mentioned in chapter I, MBDS is designed to overcome 
the performance prcbiems and upgrade issues of the 
traditional mainframe-based or the software single-backend 
database system. In cther words, the overall goal for MBDS 
is to proeve that: 

(1) the system is easily extensible; and 
(2) the performance gain and improvement should be 
proportional to the multiplicity of processing and 
storage elements [Ref. 4 : pp. 1-5]. 
In order to achieve the aforementioned goal, the design 
requirements and their correlated design issues for 
designing and implementing M3DS have been defined in [Ref. 7 
SPD. 7107]. 


1. Design Requirement 


There are three main design requirements for MBDS. 
(1) The system must be expandable. 
(2) Both the hardware and software are generic. 
(3) The database is evenly distributed across the disk 
systems of the tackends, and, for operation, there are 
parallel and ccncurrent processing of transactions by 


the Eackends. 


T3 


The first twc design requirements can support the 
additicn oi backends for performa: ‘e enhancement and 
capacity growth by adding new backends of the same type and 
by using existing system software. With the third 
requirement, performance gain (in terms of response-time 
reduction) and capacity growth (in terns of response-time 
invariance) of the system are likely to be in proportion to 


the number of backends of the system. 
2. Design Issues 


There are several issues which must be resolved in 
order to meet the design requirements of  MBDS. The first 
issue concerns the backend controller. As shown in Figure 
1.1, the controller may become a primary bottleneck of the 
system. In order to avoid this problem, the functions of 
the  ccntroller should be minimized and reduced to the 
pre-processing of the user transactions, the post-processing 
of the transaction results, the sending and receiving data 
between the backends andthe host, and the arbitraticn of 
data insertion into tke database. 

The second design issue addresses the 
characteristics and functionality of the communication bus 
Letween the controller and the backends. The bus should be 
cost-effective and efficient for both backend communication 
and rackend addition. 

The third class of issues involves the backends of 
the systen. The backends must have identical software to 
allow replication of the software on a new  kackend. 
Additionally, the kackends must have complete software to 
perforr all of the database management functions. These 
functions include directory management, concurrency control, 
record processing and communication. 

The fourth design issue concerns the database. The 
database should be evenly distributed across all the disk 


systems of the backends. 
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The fifth design issue is on the choice of a data 
model and data language, The data model should easily 
support the required data distribution and the data 
placement of the database. The data language for the system 
is of course Lased on the chosen data model. It must 
capture all of the primary operations of the database 
system. Ihe chosen data model is the attribute-based data 
model and the data language is the attribute-based data 
language. 

The sixth design issue focuses on minimizing the 
communications traffic of the system. The controller should 
only communicate with the backends for Sending the 
pre-processed user transaction, for arbitrating the data 
placement, and for receiving results. The backends shoulG 
only ccnmunicate with the controller for sending the results 
oí the user transactions. Communication among  ELackends 
should te held to a minimum. 


The seventh issue deals with the directory placement 


strategies. In order to enable each backend to perform all 
the database management functions and minimize the 
communication among  backends, the directory data are 


duplicated at each backend, 


B. THE UNDERLYING AND INTENDED HARDWARE 


An overview of MEDS hardware organization is shown in 
Figure 2.1 User access is accomplished through a host 
computer which in turn communicates with the controller. 
When a transaction (either a request or a set of requests) 
is received, the controller will broadcast the transaction 
to all the Lackends. Since the data of all data files are 
evenly distributed across all the backends, all backends can 
now execute the sare request in parallel. À queue of 


requests is maintained in each backend. When a backend 
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Figure 2.1 The MBDS Hardware Organization. 
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finishes executing one request it will send the results of 
that request to tke controller and be able to start 
executing the next request independent to the other tackend. 

Originally, lBDS is designed to be configured with a 
number of microprocessor-based processing units and their 
disk suksystems and be connected by a broadcast-tased 
communications line. When the implementation of MBDS began, 
neither the nicroprocessor-based conputers nor the 
broadcast-based communications devices were available. The 
present MBDS is configured with a VAX-11/780 (VMS OS) as 
both the host and the controller and two PDP-11/44s (RSX-11M 


OS) and their disk systems as the backends. Communication 
between computers is accomplished by 
time-division-muitipiexed buses, knowns as parallel 
communication links (PCLs). The broadcasting bus is 


simulated by the PCL. 


Currently, MBDS is being down-loaded to an initial 
configuration of eight microprocessor-based, 
broadcast-bus-connected, and Winchester-drive-suprported 
workstations, with cne of the eight being used as the 


contrclier and the others as the backends. This workstation 
(Sun-2/170, 4.2 BSD UNIX OS) has the Motorola MC68010 as the 
CPU with 16 mbytes of virtual space per process and uses 
Ethernet as the broadcast bus among workstations. The disk 
drives on the backends are Fujitsu Eagle Winchester-type 


drives, with a formated capacity of 380 mbytes per drive. 


C. THE DATA MODEL AND THE DATA LANGUAGE 


In this section we will first introduce the concept and 
terminology of the attribute-based data model which is the 
data model used in MBDS, then describe the data language in 


which users may issue request to MBDS. 
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1. Ihe 


MBDS chooses the attribute-based data model to be 
its data model. In the attritute-based data model, data is 
modeled with the  ccnstructs: database, file, record, 
attribute-value pair (keyword), directory keyword, 
directory, record bcdy, keyword predicate, and query. 
Informally, a database is a collection of files, each file 


contains a groups of records which are characterized by a 


unigue set of directory keywords. A record is comrosed of 
two parts. The first part is a coilection of 
attribute-value pairs or keywords. An attribute-value pair 


is a member of the Cartesian product of the attritute name 
and the value domain of the attribute. As an example, 
<SALARY, 30000» is an attribute-value pair having 30000 as 
the value for the attribute SALARY. All the attributes in a 
records are required to Le distinct. Certain 
attribute-value pairs of a record (or a file) are called the 
directory keyword of that record (file), because either the 
attribute-value pairs or the ranges of their attribute 
values are kept in the directory for addressing the record 
(file). The rest of the record is textual information which 
is referred to as the record body. 


The angle brackets, <, >, enclose an attribute-value 


pair. The curly brackets, {, 3, include tke record body. 
Ihe parenthesis, (o 5 form a record. The first 
attribute-value of all records of a file is the same. In 


particular, the attribute is FILE and the value is the file 
name. An example cf a record of employee file is shown 
Lelow: 


((FILE, Employee», «JCB, Mgr», <DEPT,Toy>, <SALARY, 30000» 


{Employee Description} ) 
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The record has four keywords and a record body of employee 
description. 


A keyword predicate, or simply predicate, is of the 


A 





form 

(attribute, relational operator, value). 
y thout confusion, we also use parenthesis to enclose a 
predicate. aA relaticnal operator can be one of ( =, !=, <, 


=<, >=). For example, (SALARY > 20000) is a predicate. À 
keyword K is said to satisfy a predicate T if the attribute 
Pewee ls identical to the attribute in T and the relation 
specified by the relational operator of T holās between the 
value of K and the value in 7. For example, the keyword 
«SALARY, 300230» satisfies the predicate (SALARY > 20000). 

À query consists of several keyword predicates in 


disjunctive normal fcrm. An example of a query is: 
((DEPT=Toy) and ((SALARY<30000) or (SALARY>20000))). 
2. The Attribute-based Data Language 


The data manipulation language EOL MIS, the 
attritute-based data language (ABDL) is a non-procedural 
language which originally supports four primary database 
Oper aELoOns: RETRIEVE» INSERT, DELEDS and UPDATE. Tt is the 
purpose of this thesis to design and implement the fifth 
primary database operation, the RETRIEVE-COMMON operatior. 

ihe RETRIEVE request is used to retrieve records o£ 
the datakase. The syntax of a RETRIEVE request is shown as 


Lelow: 
RETRIEVE Query (Target-List) [EY Attribute] [WITH Pointer] 


The query specifies which records are to be retrieved. Ihe 
ban -IISt 1S a mist of output attributes. It nay also 


consist of an aggregate operators on one or more output 
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attribue. MBDS supports five aggregation operators, they 
are: AMÉ, COUNT, SUM, MIN and MAX. The BY-clause and the 
WITH-clause are optional. The BY-clause may be used to group 
records when an aggregate operation is specified. The 
WITH-clause may be used to specify whether pointers to the 
retrieved records must be returned to the user cr user 
program for later use in an update request. Some examples of 


retrieve request are shown in telow. 


Fxample 1. Retrieve the names of all employees who work in 


the Toy department. 


RETRIEVE ((FILF=Employee) and (DEPT=Toy)) (NAME) 


Example 2. List the average salary of all departments. 


RETRIEVE (FILE=EFmployee) (AVG(SALARY)) BY DEPT. 


The INSERT reguest is used to insert a record into 


the database. The syntax of as INSERT request is: 
INSERT Record 


The £cllowing example will insert a record into the Employee 
file. 


INSERT (<FILE,Employee>, <SALARY,30000>, <DEPT, Toy») 


Ihe syntax of a DELETE request is: 


DELETE Query 
where the query specifies the record(s) to be removed from 
the datatase. The following example will delete records from 


the Enplcyee file. 


DELETE ((FILE=Emt.oyee) and (SALARY=30000) and (DEPT= Toy)). 
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Ihe JPDATE request is used to modify records of the 
database. The Syntax cf the UPDATE request is: 


UPLATE Query «Modifier» 


where the query specifies the particular records to be 
updated from the database and the modifier specifies the 
kinds of modification that need to be done on records that 
Satisfy the query. The following example will give a $1000 


raise to all employees. 


UPDATE (FILE=Fmployee) <SALARY=SALARY+1000> 
The RETRIEVE-COMMON request is used to merge two 
files by common attributes. It will be detailly discussed 


in the later chapters. 


D. THE PROCESS STRUCTURE 


MBDS is a message-oriented system. In a 
message-oriented system, each process corresponds to one 
system function. These processes communicate among 


thenselves Ly passing messages. The processes are created at 
system start time and exist until the system is stopped. 


Figure 2.2 provides an overview of MBDS process structure. 


Comaunication between computers in  MBDS is achieved 
by using the PCL. MEDS provides a software abstracticn to 
this rus for each computer in order to emulate broadcast 
capabilities. The abstraction consists of two complimentary 
processes. The first process, get-pcl, gets message from 
other computers off the PCL. The second process,  rut-rpcl, 
puts messages on the bus to be broadcasted to other 
conputers. Every computer, whether it is the controller or a 


tackend, has its own get-pcl and put-pcl. 
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Figure 2.2 The MBDS Process Structure. 
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There are 31 message types and one general message 
format used in the MBDS message-passing facilities. Ihe 
format (shown in Figure 2.3) is used for each of the three 
message-passing facilities, namely, messages within the 
contrclier, messages within the backends, and messages 


Letween computers. 


À Message | Data Type | 
Message Type a humeric code | 
Message Sender | a numeric code | 
Message Receiver a numeric code | 
Message Text an alphanumeric field terminated 


by an end of message marker 





Figure 2.3 The General Format of MBDS Messages. 


Messages between computers are divided into two classes: 
nessages between tkackends and messages between the 
contrcller and the backends. Figure 2.4 describes each of 


MBDS message types. 


ze Ihe Test Interface Process 


The test interface process allows the user to 
interact with the MBIS directly. Since SMBDS does not use a 
host computer, the test interface process is contained in 


the ccntroller. 
3. he Processes of the Controller 


In addition tc the communications and test-interface 


processes, the controller consists of three additional 
processes: Request Preparation (RP), Insert information 
Generation (IIG) and Post processing (PP). RP receives, 
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parses and formates a request (transaction) before sending 
the formated request (transaction) to the 
directory-management process in each backend. IIG is used 
to provide additional information to the backends when an 
insert request is received. PP is used to collect all the 
results cf a request (transaction) and forward the results 


to the user. 


4. Ihe Processes of Zach Backend 
In addition to the ccmmunication processes, each 
tackend also consists of three other processes: Record 


Processing (RP), Directory Management (DM) and Concurrency 
Contrcl (CC). 

DM controls the execution of a request at a backend 
and accesses the seccndary-storage-based directory tables. 
It determines the disk addresses where the relevant data of 
a particular request are stored and then sends those disk 
addresses to RP. 

CC is used to insure the consistency of the database 
while allowing concurrent execution of multiple requests. 

EP performs the disk I/O operations and other 
operations specified by the request. It recelves the 
secondary-addresses from DM, which processes the request. 


The results are then forwarded to the controller. 
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MESSAGE-TYPE NUMBER AND NAME SRC | DEST | PATH | 
l TRAFFIC T re 
REQUEST RESU | pp of peer | BR 
NUMBER OF REOUESTS IN A TRANSACTION REQP PP C 
AGGREGATE OPERATORS REQP PP C 
5 REQUESTS WITH ERRORS REOP PP C 
PARSED TRAFFIC UNIT RESP DM CB 
NEW DESCRIPTOR ID T DM CB 
BACKEND NUMBER TIG DM CB 
CLUSTER ID DM IIG BC 
10 REQUEST FOR NEW DESCRIPTOR ID DM 10 IIG 10 B^ 
BACKEND RESULTS FOR A REQUEST RECP PP BC 
BACKEND AGGREGATE OPERATOR RESULTS RECP PP BC 
RECORD THAT HAS CHANGED CLUSTER RECP REOP BC 
RESULTS OF A RETRIEVE OR FETCH RECP REQP BC 
CAUSED BY AN UPDATE 
15 DESCRIPTOR IDS 15 DM 15 DMs € 15 BB 15 
REQUEST AND DISK ADDRESSES DM RECP B 
CHANGED CLUSTER RESPONSE DM RECP B 
FETCH DM RECP B 
OLD AND NEW VALUES OF ATTRIBUTE RECP DM B 
BEING MODIFIED 
20g eee ATURIBUTES FOR A TRAFFIC UNIT 20 DM 20 CC 208 
DESC-ID GROUPS FOR A TRAFFIC UNIT DM GC B 
CLUSTER IDS FOR A TRAFFIC UNIT DM S B 
RELEASE ATTRIBUTE DM ples B 
RELEASE ALL ATTRIBUTES FOR AN INSERT DM | Gc B 
25 RELEASE - DESCRIPTOR-ID GROUPS 25 DM 25 CC 25 B 
ATTRIBUTZ LOCKED de DM B 
DESCRIPTOR-ID GROUPS LOCKED ae DM B 
CLUSTER IDS LOCKED ee DM B 
DINO MRS CE ERATES IMISERTS | REC REQP BC 
9 ND MORP GENFRATEN TYSERTO | REQP DM CB 
29 NO MORE GENERATED INSERTS DM RECP BC A 
OREO ST ID Of 2 FINISHED PENQUEST 30 RECP 30 CC 30B x7 
1 AN UPDATE REQUEST HAS FINISHED | RECP | DM | B | 
l AN UPDATE REQUEST HAS FINISHED DM CC B 


SOURCE OR DESTINATION DESIGNATION | PATH DESIGNATION 














HOST + HOST MACHINE (TEST- INT) H : HOST 
REQP : REQUEST PREPARATION C : CONTROLLER 
IIS  : INSERT INFORMATION GENERATION C : CONTROLLER 
PP : POST PROCESCING C : CONTROLLER 
DM : DIRECTORY MANAGEMENT B : A BACKEND 
RECP  : RECORS PROCESSING c. RD 
E SSCONCURRENZY CAXTROL : END 

Figure 2.4 The MBDS Message Types. 


In this chapter, we introduce the terminology and 
notaticns of the "Retrieve-Common" request, investigate and 
analyze Several pcssible design and implementation 
approaches, and then select the best one towdesign and 
implement the Retrieve-Common operation for  MBDS. The 
selection of an approach 1s based on the design 


requirements and the design issues of MBDS. 


A. THE INTENDED OPERATION 


1. An Operation Cn Two Files 


The RETRIEVE-COMMOR request is used to merge two 
files by common attribute values. The common attribute 
values are the attribute values which belong to the records 
cf both files. For example, suppose there are two files: 
file A and file B. File A contains the records of the 


street names of San Jcse city: 


(XFILE, A», «STREET, MONTEREY», «CITY, SAN JOSED) 
(XFILE, A», «STREET, SECOND», «CITY, SAN JOSE») 


File E consists the records of city names of the Monterey 


county: 


(<PILE, B», «CITY, MONTEREY>, COIN OREA) 
(<FILE, B>, <CITY, SEASIDE>, CCOUNTUITEMONUSEREY 
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mre RETRIEVE-COMMON. request can provide us a third file, 
say, file C, with the information such as: "All the records 
cf both files A and E, where the street name of the records 
in file A is identical to the city name of the records in 
file B. One of the records in file C which satisfy the 
request would be 


uec», <P EEE, A>, <STREET, MONTEREY>, <CITY, SAN JCSE>, 
SEP, SCITY, MONTERSYIS, «COUNTY, MONTEREY>). 


Logically, the retrieve-common request involves two 
retrieval operations. We define the first retrieval 


operation as the source retrieve and the second retrieval 


operation as the target retrieve. The set of all the 


records that belong to the result of the source retrieve is 


called the source record set. The set of all the records 


that belong to the result of the target retrieve is called 
the target record set. A source (target) record is the 


record that belongs to the source (target) record set. 


Similarly, those attributes will be refered as source 
attrikutes and target attributes. The merged source and 
target records are termed the result record set. The 


aforerentioned file C is a result record set. 

^e term the source and target attribute names that 
participate in the retrieve-commnon operation the join 
attribute names or briefly join attributes. However, their 
values are termed common attrikute values, or simply conmon 
values. The retrieve-common operation requires that the 
join attribute which is specified in the source record set 
must have the same dcmain as that of the join attribute in 
the target record set, although they need not have the same 
attrikute name. 

Consider another example, suppose the source records 
are characterized by the attributes, Employee_name, Wages, 


and the target records are characterized by Rank, Wages. 
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Further, let the domain of the Employee name be the 
character string and the domain of both Rank and Wages be 
the integer. A retrieve-common operation may be performed 
by merging on the attribute values of the wage of the 
respective source record and the target record. A 
retrieve-common operation may also be performed by merging 
on the wages of the source record and the ranks of the 
target record. Since their value domains are the same. 
However, a merge between the employee names and the ranks 
would nct be permitted, since their domains are different. 
The logical operation for the retrieve-ccmmon 


request can be described as follows. 


(1) All records satisfying the source retrieve are 
collected. 

(2) All records satisfying the target retrieve are 
collected. 


(3) The records of the two collections are pairwise merged 
on the common (source and therefore target) attribute 


values. 


2. Zhe Syntax Of Retriev 


> 
o = SS A A A Ml =e SS 


When developing the syntax of the retrieve-coumnon 
request, we must attempt to design a data language construct 
that is similar, syntactically, to the other primary 
operations of ABDI. In particular, the syntax of 
retrieve-common operation should resemble the syntax of the 
ABDL retrieve operation given below: 

RETRIEVE Query (Target-list) ¡PY Attribute] [WITH Pointer] 


Using the above syntax as a guideline, we define the syntax 


for the retrieve-comicn request as follows. 


RETRIEVE Query-1 (Target-list-1)[ BY Attributel WITH Pointer) 
COMMON (Attribute-1, Attribute-2) 
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RETRIEVE Query-2 (Target-list-2)[BY Attribute][ WITH Pointer ] 


The retrieve-common request consists of three parts. 
The first part 1S what we have referred to as the source 
retrieve request, which retrieves the source record set. 
The second part is the specification of the join attributes, 
where Attribute-1 Lelongs to the source record and 
Attribute-2 belongs to the target record. Although the 
values of these two attributes must be the same in order to 
satisfy the condition for merging the respective records, 
their attribute names need not be identical. The third part 
is what has been refered to as the target retrieve request, 


which retrieves the target record set. 


B. AN ANALYSIS OF DIFFERENT DESIGNS 


in order to make this thesis self-contained, several 
possible design approaches described in [Ref. 8] are 
reviewed in this section. 

Ihe main issue when considering alternative strategies 
for implementing the retrieve-common request is where the 
merge of the source and the target records should be 
performed. 

There are three major alternatives for distributing the 


worklcad of the retrieve-common request. 


(1) The controller does all of the merge operation. 
(2) The backends do all of the merge operation. 
(3) The controller and the backends share the worklcad of 


the merge. 


Each of these alternatives will be analyzed and judged using 
the design requirements and design issues of MBDS. 

In crder to simplify the analysis of design (or 
implementation) strategies, we make the following 


assumpticns. 
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(1) The records of the source record set and the records 
of the target record set are distributed eveniy across 
the Lackends. 

(2) The operation of the retrieve-common is performed as 


described in the previous section. 


1. Ihe Controller Does All the Merge Operation 


In this alternative, each backend only performs 
these two retrieval operations and then sends the records of 
source record set and records of the target record set to 
the controller. Upcn receiving all the source records and 
target records from all the backends, the controller 
performs the merging operation and sends the results to the 


host computer. 


DI 


la 


he Controller And The Backends Share The Merge 
erat 


lo 





ion 


Each backend performs the merge operation over its 
source records and target records. The merged records, along 
with the source and target record sets are then sent tc the 
contr cutter: The ccntroller performs the merge operation 
over the source and target record sets coming frcm different 
tackends and then sends the results together with the 
previcusly merged records (done by individule backends) to 
the hcst. 


3. dhe Backends Io All the Merge Operation 


This alternative may be further broken into two 
subalternatives. 
(a) The backends share the merge operation. 
The tackends send either source or target records to 
each other. Let's assume that the target records are 
sent. Each backend will have a portion of the scurce 


record set and a whole set of target records. Then, 
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the backends perform the merge operation over its own 
source records and all of the target records, and 
sends the results to the controller. 


(b) One designated tackend performs the merge operation. 
All records of both the source record set and the 
target record set are sent to the designated backend 
frcm all of the other backends. The designated 
Lackend performs the entire merge operation and sends 
the results to the controller. 


4. An Analysis of the Design Approaches 


Four alternatives of distributing the workload of 
the merge operation among the controller and the kLackends 
have been discussed in previous subsection. We now examine 
these aiternatives with the design goals of MBDS. 

Alternative 1, where the controller performs the 
entire merge operaticn will increase the workload of the 
contrcller. Recall that in chapter II we have stressed that 
in order to reduce tke chance of the controller being the 
bottleneck of tne system, we minimize the work of the 
controller. Alternative 1 violates this design requirement 
Iherefore, it will nct be considered further. 

Alternative 2 will increase the communications load 
and increase the workload of the controller. This 
alternative complicates the first and the sixth design 
issues of MBDS. Therefore, it will also be eliminated fron 
the design consideration. 

Alternative 3a meets the design issue of minimizing 
the ccntrcller function and distributing the workload to 
each kackend evenly. Alternative 3b does not increase the 
workload of the controller; nor does it distritute the 
workicad to each backend. Furthermore, transmitting all the 
records of both the source record set and target record set 
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will increase the cczmunications overhead. [mem af dato n, 
performing the entire merge operation in oüác  backerd will 
unbalance the workload, thereby reducing the paraiielliss or 
the backends,  i.e., by having a slnjae-rtaccocpl d cilc e 
nerje anl ail otner racxends Roba a eo ee 
of the third and sixth design issues, so tols alternative is 
also eliriratel. 

it: this analysis Wwe choose the altorrative 3a as 
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C. AN ANALYSIS OF DIFFERENT IMPLEMENTAIIONS 


Three different implementations for merjing the scurce 
and the tarjet record sets are consliered, 
(1) A stralghtforward inpiementatlon. 
(2) An imviementaticn based on Sorting and va tehinor 


(3) An ir,lementaticn basel on pucket-lLhashing. 


1. The Straijhticrward irpleweLntation 


— 
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Ine concept of this alternative Giese vet seo e ee 
the Berying operationais bascd aen Cie aes te O ni 
[Ref. 8 : p. 86] vhicu 1s Shion z OR eee 

ihis alternative is accom i EN DINE ShuSe5E 

(1) Each backend determines its own source records and 
stores then intc a predefined portion of theo sevo 
Storage area, 

(2) Each backend deterainos its own “target records ind 
stores tiem into the prelekrned Morrone eni. 


SeconJary storace area. 
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EDURE Nest_loop_merge 

Ok each record in the source record set DO 

R each record in the target record set DO 

IF the merging condition is satisfied 
THEN 


| 
form a result record | 
| 


| 
? 
END FO 
END FOR 
END EROCEDURE Nest loop merge 


Figure 3.1 Ihe Nest-loop Merge Procedure. 


(3) Each Lackend  kroadcasts its own local target records 
to all of the cther backends. 

(4) Each tackend receives the broadcasted target records 
from the other backends and stores them inte the 
secondary storage together with its own target 
records. 

(5) Each backend brings its own source records and the 
entire target record set into the primary memory, 
performs the  "nest-loop" merging operation and then 
Send the merged results to the controller. 


2. Ihe Implementation Based on Sorting and Matching 


The idea of this implementation is based on the 
following inference. 
Since the retrieve-common operation is simply a merging 
operation on two files of records sets, if we can have 
these two files presorted by the values of their common 


attributes then the merging operation may be efficiently 
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performed by matching the values of the common 
attributes of the records of these two files. 
There are two possible alternatives to perform the 


sort-match algorithm. 


(a) The backends do ail of the sorting and matching 
operations. 

(b) The backends and the controller share the sorting and 
matching operations. 

Alternative (E) will increase the workload of the 
contrcller and contradict with the design goals of MBDS, and 
is therefore eliminated from consideration. Only 
alternative (a) will be examined. Alternative (a) 


accomplishes the retrieve-common operation in four phases. 


(1) Each backend retrieve, sorts and stores its own source 


records and target records separately, and then 
broadcasts either set of records to the ctter 
backends. (Let's assume that the target records are 


transmitted.) 

(2) Each backend receives and merges the inccning 
ncn-local target records into its own local target 
records. 

(3) Each rackend performs the matching operation over its 
own portion of source records and the entire set of 
target records (from all the backends). 


(4) The backends send the results to the controller. 


3. The Implementation Based on Bucket-Hashinq 
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This implementation strategy attempts to speed up 
the comparison and merge by hashing records into small 
groups (the buckets of the hashing table) which contain 
records with common attribute values, so that the time 


complexity of the merging operation may be reduced. 
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A hashing function applied to the common attribute 
values is used to hash records into buckets. The bucket 
numbers are consecutive integers. instead of using primary 
and overflow areas, the buckets use one or more fix-sized 
blocks to store records. The numbers of blocks may varv 
anong buckets. Details of the hashing table, the buckets 
and the the blocks will be described in the next charter. 

Ihose source records and target records within the 
same bucket will be examined and merged if the merging 
condition is matched. This alternative can also be broken 
to two sukalternatives. 

(a) One common hashing table is used for both source and 
target record sets. 


(b) Iwc separate tables are used, one for each record set. 
a. One Common Hashing Table 


This alternative is accomplished by each backend 
in four rhases: 

(1) All local source records will be hashed and stored 
into blocks according to their hashed values. These 
blocks (therefore buckets) are termed source biocks 

(2) After ali the local source records have been hashed, 


the local target records are hashed one at a time and 


buffered. If the target record is hashed into an 
empty source bucket, then it is ÞDufrered fcr 
transmitting to other backends. Otherwise, all the 


records in the source bucket will be retrieved and 
merged with that target record only if the merging 
condition is satisfied. The results are first 
Luffered and then sent to the controller. 

(3) Since the non-local target records may arrive at a 
Lackend while the backend is processing sone cther 
records, each backend will place these  inccning 


records on a predefined secondary storage area. 


ZE 


(4) Each backend retrieves the non-local target records 
from the secondary storage area and processes then in 
the same way as the the backend does on its local 


target records. 
Lt. Separate Hashing Tables 


This alternative is accomplished in three 
phases. 

(1) The backends will hash and store their own source 
records and target records into two separate hashing 
tables by a common hashing function. After all of the 
target records have been hashed and stored, each 
backend will broadcast the hashed results of their 
target records  (i.e., the bucket number and the 
records associated with that bucket number) to ali of 
the other backends. 

(2) Upon receiving all of the target information from the 
other backends, each tackend stores those target 
records into appropriate buckets according to their 
bucket numbers. 

(3) The backends perform the merge operation on the local 
source records and the entire set of target records 
and send the results to the controller. The procedure 


is shown in Figure 3.2. 


4. A Comparison 


¡2 


The Three Implementation Approaches 


In this section we compare and analyze these 
inplementation  apprcaches. Since the backends work in 
parallel, our analysis only focuses on how much time it 
takes for one backend to do one particular strategy. There 
are ccmmon operations that each backend pertorms, so that 
the time ccmplexities for these operations can te ignored 
when comparing the implementation strategies. The times of 


these common operaticns are: 
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| EROCEDURE Hashing merge 
FOR the bucket value - min value to max value DO | 
IF the buckets of both tables are not empty | 
"= 
| retrieve all the records from both tkuckets 
| perform merge operation based on | 
| the straightforward algorithm | 
| End IF 
END FOR 
END EROCEDURE Hashing_merge | 
Cue 





Figure 3.2 The Hashing_merge Procedure. 


(1) the time to process the records for the source request 
which involes determining which records of the 
database satisfy the query, projecting the 
attribute-value pairs of the target-list of the 
satisfied records and forming a source record set; 

(2) the time to process the records for the target 
request, «which involes determining which records of 
the database satisfy the query, projecting the 
attribute-value pairs of the target-list of the 
satisfied records and forming a target record set; 

(3) the time to broadcast the local target records to the 
otrer backends; and 


(4) the time to send the merged results to the contrcller. 


The following notions are introduced to simply the ensuing 
analysis. 
Cs : Cardinali ty of the source record set in one backend. 
Ct : Cardinality of the target record set in one backend. 


Cb : Average number of records in a bucket. 
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M : Number of Backends. 


: Number of Index Entries in the hashing table. 


Ti : Average time tc read (write) a block of records from 
{to) secondary storage. 
Tb : Average time tc read (write) a record form (to) a 


bucket. 
TC : Average time to compare the common attribute values 
of two records. 
Th 
Tm : Average time tc merge two records. 


Average time tc hash a record. 


a. An Analysis for the Straightforward 


Implementation 


We recall that there are five phases in this 
inplementation as discussed in a previous section. 

Phase 1: Since there are Cs locali source 
records in each backend, the time complexity for storing 
them into the secondary storage is: 

Ti*(Cs/Cb). 

Phase 2: Since there are Ct local target 
records in each backend, the time complexity for storing 
them intc secondary storage is: 

Ti* (Ct/Cb). 

Phase 3: The time conplexity for this phase is 
ignored. 

Phase 4: Since each backend receives (M-1) *Ct 
target records from the other Lbackends, the time complexity 
for storing them in the secondary is: 

(d- 1)*(Ct/CbytT1. 

Phase 5: Records are merged in this phase. 
There are Cs source records and M¥*Ct target records in each 
backend. Each block of the source records is compared and 
merged with all of tre target records. It takes Ti to kring 


one blcck of source records into the primary memory írom the 
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secondary storage and M*(Ct/Cb)*Ti for the entire target 
record set. 

It takes Cb*Tb to access one block of source 
records and M*Ct*Tb to access all of the target records. 
The time complexity for comparing one block of the source 
records and all of the target records is 

CORMECBELC, 
We further assume that there are k fraction of target 
records participating the merging operation. The time 
complexity for merging one block of source records and all 
of the target records becomes: 
Kec me 
The total time complexity for processing one block of scurce 
records of this implementation is: 
[ Ti*M*(Ct/Cb) ]}+{Cb+M*Ct*CD)*Tb+ (Cb*M*Ct*Tc)+{k*M*Ct*Tu). 

There are Cs/Cb blocks of source records in each 
backend; therefore, the time complexity of this alternative 
is: 

(Cs/Cb) * {7 Ti*M* (Ct/Cb) ]* (Cb*M*Ct*Cb) *Tb 
* (CE*M*Ct*TC) * (K*M*Ct*Tm) 
Or 
(M*Cs*Ct) *[ Ti+ (Tb+k*Tm)/Cb+ Tc ]J*Ti* (Cs/Cb) *Cs*Tb 
Because Cs may be equal to Ct and M is a small constant, the 
time complexity may te further simplified to be 
O(Cs*Ct) or 


O (Cs?). 


b. An Analysis for the Sort-Matching Implementation 


We will analyze each phase of this 
inplementation approach. 
Phase 1: Each backend sorts its two record sets 


and broadcasts the scrted target record set to the cther 


BS 


Ltackends. Due to the arge size of records, the sorting 
operation can not be done by using an internal sorting 
algorithm. There are several external sorting algorithms 
which can sort the lccal source records and the local target 
records with the time complexities of 0(Cs*(logCs)) and 
O(Ct*(log Ct)), respectively. However, these algorithms all 
have some limitaticns: either using special hardware 
configuration or running different software among processors 
[Refs. 9,10]. 

Because we do not want to put limitaticns on the 
hardware configuraticn of MBDS and to use different software 


among the backends, this alternative is eliminated from our 


consideration. 
C. An Analysis for the Bucket-Hashing 
Inplementation 


In order to further simplify our analysis, we 
assume that the local source records and target records can 
be evenly hashed across all the buckets of the hashing 
tables and each bucket will contain only one block of local 
source records or one block of local target records. First, 
we analyze the alternative that uses only one hashing table. 

Phase 1: Each source record needs to te hashed, 
written into a bucket by its hashed value. This inciudes 
getting the block of that bucket from the secondary storage 
and writing the record into the block and returning the 
block to the secondary storage. Therefore, the time 
complexity for each tackend to hash and store the source 


records is: 
Cs*(Th *Tb * 2Ti). 


Phase 2: Every time a target record is hashed, 
the bucket with that hashed value is checked. If the bucket 


is not empty, then all the source records in that bucket 
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will ke retrieved intc the primary memory, compared with the 
target record and merged with it if their common attribute 
values are equal. The time complexity for bring one bucket 
(block) of source records intc primary memory is Ti. The 
time complexity for accessing those source records from the 


block and comparing with that target record is: 
Cherie) 


Suppose that the prokability of hashing a target record into 
a non-empty bucket is p and the probability of satisfying 
the merging condition is f, then the time complexity for 


each tackend to process one local target records is: 
Th + p * [Ti * Cb * (Tb + f * Tc) ]. 


Because we assume the source records are evenly hashed 
across the buckets of the hashing table, p is equal to 1. 
Ihere are Ct local target records in each backend so tnat 
the time complexity for each backend to process its local 


target records is: 
Ct* {Th+j Ti+Cb*(Tb+Tc+£*Tn) yj. 


Phase 3: Each backend receives (M-1)*Ct target 
records frcm other backends. The time complexity for 


storing thcse records back to the secondary storage is: 
(M-1)*(Ct/Cb)*Ti. 


Phase 4: It takes (M-1)*(Ct/Cb) for each backend 
to retrieve all tte non-local target records from the 
secondary storage into the primary memory. The time 


complexity for processing thcse records is: 
{M- 1) £Ct*{Th+[Ti+Cb*{Tb+Tc+k*Tn) }}. 
The time ccmplexity of this phase is: 


(M- 1)*(Ct/Cb) *Ti+M*Ct {Th+[ Ti+Cb (Tb+Tc+£*Tm) J}. 
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The total time complexity of this alternative 


for a backend is: 


Cs (Th+TL+2Ti)+2(M-1)*(Ct/Cb) *Ti 
+4*Ct {Ih+[ Ti+Cb(Tb+Tc+£#Tm) p]. 


Now, we analyze the other alternative which uses 
two separate hashing tables. 

Phase 1: The source records and the target 
records will be hashed, grouped into the buckets of separate 
hashing tables and then placed onto the secondary storage. 
The time complexity for each backend to process its local 


records is: 
(Cs+Ct)*(Th+Tb+2Ti). 


Upon receiving the target records from the other 
tackends, each backend will insert those incoming records 
into the hashing table of the target records and stored them 
back to the secondary storage. Since those non-local target 
records are grouped and sent by their bucket numbers, the 
insertion time is so quick that it may te ignored. By using 
an inverted list, the time complexity for each backend to 
return those incoming target records to the secondary 


storage is: 
(M-1)*(Ct/Cb)*Ti. 


Phase 2: Records of these two hashing tables 
will ke processed one bucket at a time. For any bucket 
number (i.e., a table entry), if the buckets of both hashing 
tables are not empty, then all blocks of the records of both 
buckets will be read into the primary memory for the merging 
operation. It takes Ti for bringing one bucket of scurce 
records (in this case, one block) into the primary memory 
and M*Ti for one bucket of target records (M blocks). Ihe 


time complexity for accessing, Comparing and  rossiLbly 
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merging one bucket cf source records with one bucket (M 
blocks) of target records (not including the disk I/O time) 
will be: 


Cb*[Tb+M*Cb*(Tb+Tc+f*Tn) ]. 
The expected time complexity for all buckets will be: 
(CS/Cb) *Cb¥[Tbh+M*Cb* (Tb+Tc+£*Tm) ] 


Therefore, the total time complexity for this alternative 





is: 
(Cs+Ct) (Th+ Tb+2Ti)+(M-1)*(Ct/Cb)*Ti 
+ (CS/Cb) *Cb*[ Tb+M*Cb*{Tb+Tc+£*Thn) ] 
| One Common Table | Two Separate Table 
Th | CSEMEGE | (Cs*Ct) 
ID | Cs+Ct *M*CL | (MEZIACSECE 
NC | UYCTRCb | Cs*M*Cb | 
| i | 
| Ti | 2Cs+H*Ct+2(M- 1)*(Ct/Cb) | (Cs+Ct)+ (M-1) *Ct/Cb | 
| | 
| Tm Meee +C bAL ES Ob +i | 


Figure 3.3 The Time Complexities of the 
Bucket-Hashing ImBplementations. 


A summary of the time complexity in terms of Th, 
Ti, Tb, and Tc for these two subalternatives is shcwn in 
PEE. 3. As shown in Figure 3.3, alternative which uses 


two separate tables is better than the other one which 
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employs only one table. Since Cb and M are constants, £f is 
smaller than 1 and Ct may be equal to Cs, we can further 
sinplify the the time complexity of the two-separate-tables 


subalternative to be: 
Q(CstCt) or 


O (Cs). 


d. The Conciusion for Our Inplementation Aprroach 


À summary of the analysis for those 
implementation approaches in terms of time complexity are 
shown in Figure 3.4. Clearly, the one based on 
Bucket-Hashing with two separate hashinj tables is the best 
approach. Therefore, our implementation will be kased on 
that approach. The details of design and implementation 


will ke discussed in the next chapter. 
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| | 
| Straightforward | O (Cs?) | 
| Sorting-Matchi ng | Not considered | 
] . i | 
| Bucket Hashing | O (Cs) 


Figure 3.4 Time Ccmplexity of Different Implementation. 
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IV. DETAILED DESIGN FOR IMPLEMENTING RETRIEVE-CCMMON 
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In the previous chapter, a bucket-hashing  kased 
implementation approach has been selected for implementing 
the retrieve-common operation irto “BDS. In this chapter, we 
focus on specifying the detaiis of that approach and discuss 
any of the existing MEDS software which will be affected by 
this inplementation. Our primary goal is to use the 
existing software as much as possible and to minimize the 
erfects which may be caused by the implementation. 

The operations cf the retrieve-common request may be 
describec in four phases. First, the user's reguest must be 


preprocessed so that all backends can be informed by an 


appropriate message. This is the reguest-preprocessing 
phase Second, the records of both the source and the 
target record Sets are retrieved before the merging 
operation. This is the record-retrieving phase. Tuai; 


those retrieved records are hashed on the values of their 


poll/iMae0ributestand stored into a hashing table acecrding to 


their hashed values (i.e., the bucket numbers). We recall 
that there are two hashing tables, one for the scurce 
records and one for target records. Further, the hashed 


local target records are broadcasted to the other packends. 
This is the hasking-and-storing phase. Lastly, hashed 
records of source Euckets and hashed records of target 
tuckets are compared and merged bucket-by-bucket, 
respectively. The merged results are sent to the contrcller 
from all of the backends. This is the merging phase. The 
controller then forwards those results to the host computer. 

The operations of the first and second phases car be 


done Ly the existing system software wlth minor 


mo ications. However, in order to accomplish the 
ope. ticns of the last two phases, we must design a new set 
of procedures, which we have referred to as the hashing 
module. In the remainder of this chapter, we first describe 
the hashing module, and then the operations of those four 


phases: 


A. THE FASHING MODULE 


This module is designed to accomplish the operaticns of 
the last two phases cf the retrieve-conmon request. There 
are three procedures within this module. They are: the 
hashing procedure, the bucket-block tracking procedure and 
the merging procedure. In this section, we first discuss 
the two different alternatives for implementing this 
module. After choosing the retter alternative, we tken 


describe the three prccedures of the hashing nodule. 


1. Alternatives for Implementing the cashing Module 


There are two alternatives that may be used for 
implementing the hashing mocule. In the first alternative, 
the hashing module is implemented as a separate process of 
the kackend. This alternatives modifies the existing 
process structure of a backend by introducing a sixth 
process andits associated communication paths irto each 
tackend. In the seccnd alternative, the hashing module is 
implemented as part of the existing record processing 
process (RECP). This alternative leaves the existing backend 


process structure unchanged. 
ae AS a Separate Process 


In this alternative, the hashing module is 
designed as a Separate process of the backend. The inputs 


to the hashing module are either the local source or target 
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records frou the local RECP or the other target records fron 
the RECEFs of the other backends. The outputs from the 
hashing module are the merged results, which are sent to the 
Compeeller. The transfer of records between processes 
ef non-local target recors from  "Put.Pcl". to the 
hashing module or the local source records or the local 
target records from the local RECP to the hashing module) is 
accomplished using tke interpiccess message capabilities of 
each kackend. The new ,rocess structure of each Lackend 
with the additional ccmmunication paths is Shown as Fig 4.1. 
Since the hashing mcoule is an independent process, the 
effects of this implementaticn on the other processes of 


MEDS may be minimized. 
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Each Backend | 
| Lut Pcl | | Get Pcl | 
| boas | 

Record , Concurrency Directory 
Processing Conero Management 

Hashing 

Module 
Figure 4.1 Hashing Module As a Separate Process. 
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b. AS a Procedure within Record Processing 


In this alternative, the hashing module is 
designed as a group cf procedures that are ade de o 
In Figure 4.2 we show the structure of the hashing module 
with RECE. The local records (both the source records and 
the target records) are retrieved by the physical Jata 
operation of RECP of each ENCKeRS Once the records are 
retrieved, they are sent to the hashiny module. The 


non-lccal target records are received by RzZCP rrom the cther 


Lackends and then passed to the hashing module. The merged 
results are then sent to the controller. With modularized 
programming, the hashing module may Ee independently 


implemented with a uziniaual effect on the original RECP 


software. 


RECP of Each Backend | 
PA 
Aggregate PRIO Leal | 
Operation Data 
“restasse zz Operation 
| Retrievel 
Local Records 
Hashing Module | 

| 
| 
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Figure 4.2 Hasing Module as Part of RECP. 
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C.  Compariscn of These Two Alternatives 


Both alternatives can be easily implemented with 


minimal effect on the existing systen. The difference 


Letween these two alternatives is the way that the local 


records are passed frcm the "physical data operation" to the 


hashing module. In alternative (a), the records are passed 


as an interprocess message. In alternative (b), the records 


are passed as a parameter of a procedure call. We choose 


alternative (b) for three reasons. 


(1) The message-passing between two processes within a 


(2) 


(3) 


backend is slcwer than the parameter-passing. In 
message-passirg, both processes have to access a 
ccmmon memory tc put (or get) message. The accessing 
tire coupled with the time required to place a 
message in the common memory by the sender and fetch 
the message frcm the common memory by the receiver is 
considerable. In parameter—passing, only the logical 
address of the record buffer is passed between the 
precedures, which 1S much simpler and faster. 

Even if message-passing within a computer is extremely 
fast, there is a large number of messages  (i.e., 
records) which is considerable. Since it amounts to 
route the messages (records) between two processes. 
The extra communication paths resuired by alternative 
(a) (i.e., the communication paths among the hashing 
module and the other  MBDS processes), increase the 
nunber of messages passed within a backend and among 
Lackends. By increasing the inter-backend and 
intra-backend communication, we may adverselv efiect 


the overall performance of a backend. 
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2. The Hashing Erocedure 


This procedure is used to perform the hashing 
operation on the values of the join attributes of the input 
records. The inputs to the procedure are either the local 
source records or the local target records, which are 
received from the rhysical-data-operation subprocess of 
BEGPS The output frcm the procedure are the input records 
and their hashed values (i.e., the bucket numbers), which 
are sent to the bucket-block tracking procedure with the 
request id for further processing. 

The hashing operation is gone by the hashing 
functicns of this procedure. Since the type of the values 
of the join attributes may either be an integer or a 
character string, we have designed two hashing functicns in 
this procedure. Generally, a good hashing function should 


satisfy the following three requirements: 


(1) All of the reccrds shouid be evenly distributed into 
buckets of the hashing table; 

(2) The chance of hashing different records into the same 
Eucket should be minimized; and 

(3) The hashing computation should be fast. 


These requirements are closely related to the number of 
buckets and the hashing algorithm which is used in the 


hashing function. 
a. The Number of the Buckets 


A hashing table with a large number of buckets 
is useful for a number of reasons. First, the large number 
of kuckets may reduce the chance of hashing different 
records into the same buckets. Seconà, the number of 
records in each bucket is also quite small, and this will 


reduce the access time during merging. However, it would be 
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impractical to have a table with a very large number of 
bucket entries, where each bucket would only contain a few 
records. When the table becomes exceedingly large, a 
substantial cost is incurred to maintain the bucket index. 
The bucket index of a hashing table is an array of 
fixed-size bucket entries. There is a bucket entry for each 
Lucket to keep track cf the records which are stored in that 
bucket. Therefore, the number of buckets (and therefore the 


Eucket entries) can be computed by the following equation: 


Iet X be the size of the bucket index (measured in bytes), 
Y be the size of a bucket entry (measured in bytes), 
then the number cf buckets is (X / Y). 


For example, if the size of bucket index of a hashing tatle 
is 8K bytes and the size of each bucket entry is 8 bytes 
then the number of bucket entries for that hashing table is 
I i.e., 1024. 

How should we determine the size of the bucket 
index cf our  hashirg table? Since MBDS allows the 
concurrent execution cf different user transactions, there 
may Le a number of retrieve-ccmmon requests being processed 
Ey the systen. Each of the retrieve-common requests 
requires two hashing tables, one table for the source record 
set and one table for the target record set. Because of the 
potentially large number of hashing tables concurrently in 
use, it will be necessary to store the bucket indexes cf the 
tables in the secondary storage and stage them into the 
primary memory on demand, To minimize and optimize the size 
of the bucket index of the hashing table, it is desirable to 
have the size of the Eucket index as a multiple of the unit 
of disk I/O transfer. For example, if the unit of disk I/0 
transfer (which is typical the track size) is 4K bytes, then 
the size of the bucket index shall be M*4K bytes, where M = 


O In cur case, we choose 16K bytes to be the 
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size of our hashing mtb 
2048 tuckets) 


size of 8 bytes. 


yielding 


in the hashiny table 


b. The Hashing Aigorithm 


Since the value type of 
te either an integer cr a character 


two hashing functions, one for each 


2048 entries (therefore, 


each with a bucket entry 


the join attribute may 


striny, we have designed 


value type. 


(1) The Hashing Alyoritha for the 
Integer-Valued Attributes. In oró-c to evenly distribure 
the values of all jcin attribute. .nto the buckets and to 
minimize the collisicns; we use ae information about the 
maximum and minimum values of the join attributes. These 
information is maintained in the record templates. The 
hashing algorithm for the integer attribute value is 


descrited as follous. 


Step 1 values of 


let 


Get the MAX (maximum) and MIN (minimua) 


the join attribute from the record template. 


X - The numrber of buckets in Lashinj table 
Step 2: If MAX-MIN < X 
then go to step 4 
else Tenpl = (MAX - MIN) Div X 
Step 3: Get the input record and let 
Y = The value of the join attribute 
bucket number - (Y - MIN) Div Templi 
go to ster 5 
Step 4: Get the input record and let 


y= 


rucket_nuttker = 


The value of the join attribute 
Y- MIN 


Step 5: Return the tucket nurber to the calling rrocedare. 


(2) The dashing Aljorithm E the 
Character-Valued Attributes. The recorl template does not 
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Ihe record template does not provide the aaximum and the 
minimum values for the character-valued attributes as it 
does for integer-valued attritutes. In order to minimize 
collisions and đistribute recorās eveniy into” buckets, “ve 
design a lookup tattle, which is an array with 2048 
character-string elements, to perform the hashing function. 
The number of the elements is egual to the number of the 
entries in the bucket index of the hashing table. The 
values of the join attributes of the input records are 
searched against the contents of the lookup table to cktain 
the bucket values. Ihe binary search algoritin is used to 
minimize the searching time of the lookup table. 

The contents of the entries of the lookup 
table are created in the following way: 

(1) Get a English dictionary with more than 2048 pages; 

(2) Divide the pace number by the number of the buckets 
(in our case the number is 2048); 

(3) Let the result be x.y, where tne x and y are positive 
decimal digits; 

(4) Pick up the last word of every x.y page from the 
Gictionamywand™ pWa%ee “the first four characters as an 
entry in the lcokup table; and 

(5) If the length of the selected word is less than 4, 
fill the word with trailing blanks. 

We use only the first four characters to compare the values 
c£ join attributes for two reasons. First, we believe that 
there are very few English words that will have the same 
first four letters. Second, we want to reduce the 
primary-memory requirements for the lookup table. 

vPhewWuoritha WEGR' tie character-vaxued 
attritutes is as follows. 

Step 1: Let MIN = 0 and MAX = 2047. 

Step 2 Get tbe input record and let 

X - The value of the join attribute; 


ace”; rf X > look up_table[|MAX | 
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then 
bucket_number = MAX, go to step 6. 
Step 4: Use binary search to find the bucket number. 
Step 5: Return the Lucket number to the calling rrocedure. 


3. The Bucket 


The input to this procedure may be either the local 
records (either the source records or the target records) 
with their bucket numbers from the hashing procedure or the 
non-local target records grouped by tneir bucket values from 
the other backends. The outputs from tne procedure are the 
logical addresses of the hashing tables of the source 
request and the target request, wnich are sent to the 
merging procedure for the merging operation. The 
bucket-block tracking procedure performs three functions: 

(1) maintaining a global table to keep track of the 
logical addresses of the hashing tables for all 
retrieve-common requests which are currentiy teing 
precessed in the systen; 

(2) maintaining a hashing table for the current reguest 
and keeps track of all of the buckets and blocks of 
that hashing takle; and 

(3) storing the input records into appropriate buckets and 


blocks according to their bucket values. 


In order to provide a better understanding of this 
procedure, we first introduce the structures of the blocks, 
the buckets, the hashing table and the global table. We then 


discuss how these functions are accomplished. 
a. The Structure of a Block 


Each block is divided into two parts: the header 
and the body. The header has two fields. The first field is 
used to record the length (in bytes) of the body, i.e., all 


54 


of the records in bytes currently stored in this klcck. the 
second field is used to store the logical address of the 
next Llock whose reccrds have the same bucket value as this 
block. If there is no other block of the bucket, then there 
is a null address in this field. The body is used to store 
the hashed records and their common attribute values. 
Blocks which are in the same bucket are maintained as an 
inverted list and tracked by their logical addresses. 1ne 
structures of the block and its header are shown in Figure 
Med 
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Ds The Structure of a Block 


Length Logical Address 
Bod y Next Block 


0f | of | 
| | 
D ade n Sarnano — 7 


Ij The Structure of Block Header 


Figure 4.3 The structures of Block and Its Header. 


E. The Structure of a Bucket 


As mentioned in chapter II, instead of using 
primary and overflow areas, each bucket uses fixec-size 
blocks to store records. The number of blocks per bucket 
may vary among different buckets. The bucket entry is used 
to indicate the status and to keep track of the blocks of 


that Eucket. 
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Each bucket entry in the bucket index has two 
parts: the status and the logical address of the  blcck 
currently teing used. The status is used to indicate 
whether cr not the bucket is empty. The size of the bucket 
entry is 8 bytes, where 2 bytes are used for the status and 
6 bytes are used for the logical address which is 
represented by a tuple consisting of the logical disk 
number, the logical cylinder number and the logical track 


number. The structure of a bucket is shown in Figure 4.4. 


Status The logical address 
of of 
The Bucket The Block Currently Being Used 


Figure 4.4 The Structure of a Bucket-entry. 


c. The Structure of the Hashing Table 


A hashing table is an array of bucket entries. 
We anticipate that the retrieve-common operation will be 
implemented on a SUN Workstaticn running the UNIX operating 
System, with a 16K unit of disk I/O. Using the equation 
from the previous subsection, we can compute tne number of 


bucket entries for our hashing table to be 2048. 


d. The Global Table 


Since MBDS allows concurrent processing during 
the retrieval operation, there may be several 
retrieve-common requests in the system. We need a table 
that keeps track of all of the logical addresses of the 
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hashing tables for each retrieve-common request. Each entry 
of tne global table contains two parts: the request id of 
the request and the logical address of the hashing table for 
that request. The request id consists of the traffic id, 
which is the unique identifier of a traffic unit [Ref. 11: 
p. 41], and the request number which indicates the sequence 
cf the request in the traffic unit. Each entry of the 
global table is created whenever a new  hashing table is 
created, and deleted when that request has been completed 
processing. The structure of the global table is shown in 


Figure 4.5. 


| 
Request ID Logical Address 
We LLL ccn — of 


| | | 
Wraff ve ID | Request No | Hashing Tables | 


Figure 4.5 The Structure of the Global Table. 


e. The Sequence of the Operations of the 


Bucket-block Tracking Procedure 


The steps of the sequence to accomplish the 
cperaticns of this prccedure are described as follcws. 


Step 1: Create and initialize the global table. 
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Step 2: Check the request 1D of the Input recon is NOME e 


Step 


Sep 


step 


Step 


ENDE 


Step 


oter 


SECC 


10: 


global table to see if the input records belong to 
a new request. If they do, then allocate a hashing 
table for that request, initialize the bucket 
index and store the logical adiress of the hashing 
table into the global table. Otherwise, get the 
existing hashing takle into the primary memory 
using the logical address information provided by 
the global table. 

Extract a record from the input buffer. If the 
record is the first record of that recuest, then 
go to step 10. 

If the bucket value of this record is the same as 
the previous one, then go to step 8. 

Store the block which contains the previcus record 
back to the secondary storage. 

Get the desired bucket entry (table entry) for the 
record by its hashed bucket-value. Check the 
status of the bucket. If it is "empty", then go 
to step 11. 

Get the currently used block by its logical 
address in the bucket entry. 

If there is space in the block that is availatle 
for storing this reccrd, then go to step 12. 

Get a new block, put the current logical address 
of the bucket entry into the "logical address of 
next block" field cf the block header. Then, 
update the bucket entry witha the logical address 
of this new block. Goto step 12. 

Get the desired  Eucket entry by its hashed 
tucket-value, update the status of that bucket 


entry to "nct empty". 


Step 11:Get a new Elock and put its Togical een 


tke bucket entry. 


SO 


SUI Store the recorda into the block and update the 
"length of record" field of the block header. 
Stef 13:Repeat the steps 3 to 12 until all records have 


been processed. 


Notice that the block is not immediately 
returned to the secondary storage after the insertion of one 
Input record. Since the records in MBDS are stored by 
clusters, it is very likely that records within the same 
cluster will be retrieved again. Therefore, by keeping the 
current block in the primary memory, we may save one store 
and one read operations if the next input record is 
retrieved from the same cluster and hashed into the same 


bucket (that is, they may have the same bucket value). 


Y. The Merging Procedure 


This procedure is used to perform the merging 
operation. The inputs to this procedure are the logical 
addresses of the hashing tables of the source reguest and 
the target request, which come fron the bucket-klock 
tracking prccedure. The outputs from this procecure are the 
merged results, which are sent to the controiler. 

Ihe algoritamn of the merging procedure is as 
follows. 

Stef 1: Reserve a result buffer. 

Step 2: Get the hashing tables of the source recuest and 
the target request by their lcgical addresses. 

Step 3: Compare the bucket statuses of these two hasning 
tables bucket by bucket. If both buckets contain 
records fcr a particular bucket number, then 
retrieve all the records associated «with this 
particular Eucket value froa both tables. 

Step 4: Apply the straightforward merging algorithu on 
those retrieved records. Insert merged results 


into the result buffer. 


DIS 


Step 5: If the result buffers. ae, o eee e S 
contents to the controller. 

Ster 6: Repeat steps 3, 4 and 5 until all the buckets nave 
been processed. 


Step 7: Free the result buffer. 


B. THE OPERATIONS OF THE POUR PHASES 


In this section we discuss the operations of each phase 
of the retrieve-commcn request and the software which will 


be affected by those cperations. 
1. Ihe Reguest-rreprocessing Phas 
a. The Operations 


The operations of this phase include parsing the 
user's transaction (cr request) and if the transaction 
(request) is correctly parsed, then the controller will 
compose an appropriate message to inform the tackends to 
tegin execution for the request. Since the retrieve-conmon 
request is conceptualized and executed as two retrieval 
operations, the parser has to parse the user's request and 
transform the request from the form oí a single request to a 


form cf a transaction with two requests. 
k. The Affected Software 


BaSically, operations of this phase can be done 
by the existing Request Preparation process. However, the 
software for this process must be modified as follows: 

(1) The parser should be able to recognize the newly added 
Syntax and correctly parse the request; 

(2) The composer shculd be able to form a new message to 
inform PP and all of the backends so that they can 


perform the desired operation; 
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(3) New message types are added for processing the 
retrieve-common request; and 

(4) PP and all of the backends should be able to recognize 
and process the new created message for the 


retrieve-common request. 
2. Ihe Record-retrieving Phase 
a. The Operations 


Operations of this phase include the address 
generation and the record retrieval for both the scurce 
request and the target request. These two requests will be 
processed by DM as the other four different types of 
requests. As mentioned in previous chapter, the target 
records are processed after the source records. In crder to 
separate the records of these two requests, DM will first 
send the source reguest and its associated address set to 
RECP, and hold the target request and its addresses set 
until receiving a message frcm RECP indicating that ail 
source records have been retrieved. 

The record-retrieving operation is performed py 
the physical-data-operation sukprocess in RECP as a regular 
retrieve request. Instead of sending the retrieved records 
to tne controller, control logic is used to route then to 


the hashing module fcr hashing and subseguent merging. 
k. The Affected Software 


Most of the operations of this phase are done by 
DM, CC anā the Physical Data “Operation of RECP in each 
tackend. The affected software includes: 
(1) We need to add Contre! Vogiemeinto DM so that the 
address information of the source and target request 
will not be sent to RECP together; and 
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(2) We need to add a new procedure to handle the 
retrieve-common request and control logic to route 


the results to the hashing module instead to PP. 


3. The Hashing-and-storing hase 


This is the most important part of the 
retrieve-common request. Ail of the records are prepared in 
this phase, so they can be merged on next phase. The 


operations of the hashing-store phase includes: 


(1) performing hashing operations on the local records, 

(2) table maintenance and bucket-block tracking 
operations, and 

(3) broadcasting (and receiving) the target records and 


their bucket-values to (from) the other backends. 


a. The Hashing Operations 


This operation is performed by the hashing 
procedure of the hashing module. Upon recelving the local 
records from the previous phase, the hashing procedure will 
check the record template to get the value type of the 
common attribute values and then apply an appropriate 
hashing functicn to hash the common attribute values. Ine 


records and their  hashed bucket-values will then be passed 


to the bucket-block tracking procedure for further 
processing. 
b. Table-maintenance and Bucket-block Tracking 
Operation 


This operation is done by the | bucket-Flock 
tracking procedure. A global table is maintained to store 
the address of all of the hashing tables for all of the 


different retrieve-ccmmon requests which are currently being 
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processed by the system. Whenever a new retrieve-conaon 
request is encountered, the bucket-block tracking procedure 
will create a new  hashing table for that request. [ne 
logical address of the newly created hashing table is then 
stored into the glokal table. The hashing table will be 
deleted when the request is complete. Records are stored 
into buckets according to their hashed values. The 
information of the bucket entries and the block headers are 
Maintained and updated by the bucket-block tracking 


procedure as described in the previous section. 


C. Broadcasting And Receiving Target Records 


Between Packends 


After the local target records has been hashed 
and processed, each backend wiil buffer its local target 
records (retrieved frcm the target-hashing table with their 
pucket values) and broadcast them to the other backends. 
Upon receiving those non-local target records, each backend 
will store them intc the target-hashing table by their 
bucket values. A checklist is used to ensure that the 
target information frcm all of the other backends has been 


received. 
d. The Affected Software 


Since the operations of this phase are done by 
the hashing module; RECP is affected to the extent that this 
module is integrated into the RECP process. No cther 


existing software will be affected. 


4. the Merging Phase 





This is the last phase of the retrieve-ccmmon 
cperation. The local source records and the entire set of 


target records are ccmpared aud merged. 
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a. The Operation 


The operations are performed by the merging 
procedure of the hashing module. Because the SRMECUPISUONR 
roth tables are unscrted, they are merged by using the 
straightforward algorithm. The merged results are stored in 


a result buffer and then sent to the controller. 
L. The Affected Software 


Since this phase is also done by the hashing 
module; RECP is affected to the extent that this module is 
integrated into the RECP process. No other existing systen 


software is affected. 
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V. IHE IMPLEMENTATION 
In this chapter, we describe how the retrieve-common 
request is integrated into the MBDS system. To successfully 


perform the integration, it is necessary to modify a portion 


of the MBDS software. Therefore, this chapter alsc on 
discussing how the MBDS software is modified for the 
integration and implementation of the retrieve-ccmron 


operation. 

In the remainder of this chapter we first describe the 
modified processes of the controller. Second, we describe 
the ncdified processes of each backend. Then, we present 
the modified 4BDS message-passing facilities. Finally, we 
trace the execution sequence of the retrieve-common request 
in terns of the types of messages that are passed among the 


MBDS rrocesses. 


A. THE MODIFIED PROCESSES OF THE CONTROLLER 


ieee lage Request Preparation Process (REQD) 


There are txc subprocesses in aEQP, namely the 
parser and the composer. The parser parses the requests ani 
checks for syntax errors, The composer transforms the 
correctiy parsed requests into the form required for 


processing at the backends. 
a. The Parser 


The parser does both the lexical and the 
syntactical analyses cf the ABDL transaction (or requests). 
The input to the parser is either a rezuest or a 
transaction. The cuütputs frcm the parser are tie error 
messages to the test interface, the aggregation operatcrs to 


PP and the correctly farsed requests to the composer. 
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The lexical analysis is done by the lexical 
analyzer produced by IEX [Ref. 11: p. 42]. The input to 
LEX is a specification of the tokens of the language(i.e., 
the tokens of ABDL) in the form of regular expressions andi a 
set of subroutines which specify the actions to be taken 
upon recognition of the tokens. The syntactical analyzer is 
generated by YACC (Yet Another Compiler Complier) [Ref. 12]. 
The input to YACC is a specification which includes the 
declarations of tokens' names, the rewriting rules of the 
grammar, and the action program. YACC produces a C program 
to determine whether the input ABDL transactions (requests) 
are syntactically correct. 

For the parser tc correctly parse the users‘ 
retrieve-common requests, we have made several modifications 
to the original parser subprocess. These modifications are 
listed below. 

(1) Regular expressions for the LEX. 
We have added a new set of regular expressions so 
that the lexical analyzer can recognize the 
retrieve-common reguest and generate appropriate 
tokens which in turn can be recognized and used by 
YACC. 

(2) Grammar rules fcr YACC. 
A new set of rules has been added into the criginal 
ABDL grammar sc that the parser can recognize those 
tokens which are generated for retrieve-common request 
and organize those tokens by these newly created 
rules. 

(3) The request tyre. 
We have added a new request type, the retrieve-common 
request, SO that the parsed transaction can be 
correctly identified and properly executed by the 


ccmposer and tte other processes of MBDS. 
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(4) The action program. 
The input of the retrieve-common request to the parser 


is in the form cf a single request. The parser should 


te able to parse this request and generate a 
transaction of two retrieval requests (each of the 
retrieve-common request type). If the join attribute 


is not in the target list (of the source or the target 
reguest), the action program inserts the join 
attribute into the head of the target list. The extra 
attribute-value pairs (i.e., the join attribute-value 
pairs) of the retrieved records, which are going to be 
deleted by the rerging procedure, are not to be in the 
results so that the merged results contains  cnly the 
desired attrikute-value pairs. The newly added 
regular expresSions, grammar rules and the SSI for 
the modified action program are provided in Appendix 
A. 


L. The Compcser 


The composer receives tne correctly parsed 
requests from the parser and formats them into the require’ 
message format. Then, the composer broadcasts the formated 
messages to all of the backends for execution. We have 
modified the original composer program so that the composer 


Can correctly reformat the retrieve-common request. 


Ihe post processing process includes the aggregate 
post operation and the reply monitor. The functions of PP 
are described in [Ref. 11: p. 27]. The aggregation post 
operation is not modified. The only modification in the 
reply monitor is to recognize the new request type for the 


retrieve-common request. 
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B. THE MODIFICATION OF THE BACKEND PROCESSES 


As described in chapter II, one of the design issues of 
MBDS is to assign as ruch work as possible to the backends. 
Conseguently, there are more changes in the processes of 
each tackend than changes in the controller. The affected 


processes are directory management and record processing. 
1. The 


DM receives the new transaction message for the 
retrieve-common request from the request composer and then 
performs a number of directory operations, which includes 
attribute search, descriptor search, cluster search, address 
generation and directory table maintenance. From OE 
earlier discussion, we know that the source and target 
request for a retrieve-common request should not be 
processed concurrently by RECP. The target reguest must be 
held in DM until RECP informs DM that the source request has 
finished execution. Therefore, DM wiil first process the 
source request and send the request and its addresses to 
RECP. The target request is held in DN until RECP notifies 
DM that the source request is done. 

At what stages of the DM processing do we hold the 
target reguest? There are several alternatives for holding 
the target request in DM. These alternatives are list below. 

(1) Hold the target request without performing any 
directory operation. 

(2) Hold the target request after it completes attribute 
search. 

(3) Hold the target request after it completes attribute 
Search and descriptor search. 

(4) Hold the target request after it completes attribute 


search, descriptor search and cluster search. 
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(5) Hold the target request after it conpletes attribute 


Search, descriptor search, cluster search, and address 
generation. 


Alternatives 27, 3, 4, and 5 will generate status and 
directory information for the target reguest which must be 
held somewhere. Due to the large number of the possible 
attributes, the size cf the status and directory information 
may be too big to be kept in the primary memory, i.e., they 
will have to be stored back to the secondary storage. The 
extra disk I/0 time for moving the status and directory 
information in and out of the primary memory, not only slows 
the retrieve-common operation, but also ircreases the 
program compiexity and causes many unnecessary changes to 
the existing software. Therefore, we choose alternative (1) 
to process the target request. 

The algorithm for the modified Du is as follows. 

Ster 1: Get the next message from the message sgueue and 
find the sender of the message. 

Step 2: If the sender is the controller, then go to step 
DE 


Step 3: If the sender is RECP, then go to step 8. 
Step 4: If the sender is CC, then go to step 11. 
Step 5: If this is not a retrieve-common transacticn, then 


go to step 11. 

Step 6: Identify and separate the source request and tLe 
target request from the transaction. Hold tne 
target request and perform the directory 
processing on the source request. 

Step 7: Send the scurce request with its address set to 
RECP. Go to step 1. 

Step 8: If this is not the message which indicates the 
completion of retrieving all the source records, 


tnen go to step 11. 


69 


Step 9: Get the correspondent target request and perform 
directory prccessing on that target request. 

Step 10:Send the target request with its address set to 
RECGPES 


Step 11:Perform the original DM operation. 
The SSL for the modified DM is provided in Appendix B. 


2. The Record Processing Process (RECP) 


RECP receives the requests and their address sets 
from DM and performs the physical data operations on those 
requests. The original physical-data-operation subprocess 
includes a control function and a subfunction for each type 
of request. The sukfunctions are invoked by the control 
function according to the type of request being processed. 

In order to process the retrieve-common request, we 


have made two modifications to RECP: 


(1) adding a new subfunction, the retrieve-ccümon 
surfunet1 on; into the physical-data-operation 
Sukprocess; and 

{2) adding a new subprocess, the hashing module, into 
REGAR 


a. The Retrieve-Common Subfunction 


The purpcse of the retrieve-common subfunction 


is to direct t he flow of the control in the 
physical-data-operaticn subprocess SO that the 
retrieve-common request can be processed correctly. Tke 


difference between the retrieve-common subfunction and the 


retrieve subfunction can be summarizei as follows. 


(1) The retrieve subfunction sends the retrieved records 
to the PP, whereas the retrieve-coumon subfunction 


sends the retrieved records to the hashing module. 
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(2) In addition to sending a message to CC to indicate the 


completion of the retrieval of physical data (as the 
retrieve  subfunction does), the retrieve-ccmmon 
sukfunction will send a message to notify DM that all 


the source records have been processed. 


The algorithn for tae retrieve-common 


subfunction is as follows. 


Step 1: Reserve a result buffer. 


Ster 2: For each address in the set of tracks which are 


furnished ty DM, fetch the track fron the disk 
and place it in the track buffer in the primary 


memory. 


Step 3: Examine the records in the buffer one-by-one. If 


the record is marked for deletion, disregard it. 
If the record does not satisfy the query, 
disregard it. If a record satisfies the query, 
then extract the values for the attribute names in 
the target-list of the request and store this 


information in the result buffer. 


Step 4: When the result buffer is fuli, send the contents 


of the buffer to the hashing module. 


Step 5: Repeat steps 2, 3 and 4 until there are no more 


addresses for the request. 


Ster 6: Send a message to CC to release the lock for tais 


request. If this is a source request, then send a 
message to DM so that DM car process the target 


request. 


Step 7: Free the result buffer. 


The 


SSL for the modified control function and the 


retrieve-common subfunction are provided in Appendix C. 
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t. The Hashing Module 


The hashing module performs the hashing and 
merge operations. The merged results are sent to the 
contrcller. The module is invoked py the retrieve-ccmmon 
subfunction of the physical-data-operation Subprocess. 
There are three procedures within this module, the hashing 
procedure, the  bucket-block tracking procedure and the 


merging procedure. 


(1) The Hashing Procedure. The hashing 
procedure receives the records from the retrieve-comnon 
subfuncticn of the rhysical-data-operation subprocess’ and 
performs the hashing function on the value of the join 
attrikute of each record. The records and their hashed 
results are stored in a result buffer. When the buffer is 
full, its contents are passed to the bucket-block tracking 
procedure for further processing. 

The algorithm for the hashing procedure is 


as follows. 


— 


Step 1: Reserve a result buffer. 


Sec 


NO 
m 


Get the data type of the value of the join 
attribute from the record template and reserve a 
result buffer. 

Step 3: Extract a record from the input burfer which is 
passed from the retrieve-common subfunction. 

Step 4: Apply the appropriate hashiny function to hash the 
value of the join attribute of the record 
according to data type. (See Chapter IV again.) 

Step 5: Store the record and the hashed bucket value in 
the result ruffer. 

Step 6: If the result buffer is full, then send the 

contents of the result buffer to the Lucket-bicck 


tracking prccedure. 


eZ 


Step 7: Repeat steps 3, 4, 5 and 6 until there are no nore 
records in the input buffer. 
Stef 8: Free the result buffer. 


The SSL for the hashing procedure is provided in Appendix D. 


(2) The Eucket-block Trackirg Procedure. This 


procedure stores the records (both the source records and 


the target records) into blocks according to their bucket 
values and maintains one hashing table for the currently 
processed request and one global table to store the 


logical-hash-table addresses for all of the retrieve-ccmmon 
requests in system. The inputs to this procedure are the 
records and their hashed bucket values, which either ccme 
from the local hashing procedure or from the other backends. 
A checklist is used to ensure that the hashed results of the 
non-local target reccrds are received from all of the other 
tackends. There is also an additional disk I/0 buffer used 
in this procedure to nove the blocks of each bucket into and 
out of the primary memory. The outputs from this procedure 
are the logical addresses of the two hashing tables of the 
source request and the target request, which are passed to 
the merging procedure. The structures of the global table, 
hashing table, bucket, and block have been described in 
Chapter IV. After prccessing all of the local records, this 
procedure will group the local target records together with 
their bucket numbers, and then broadcast then to all of the 
cther backends. 


The algorithm for this procedure is as 
HOLLOWS « 
Step 1: Create the global table and reserve a disk 1/70 
buffer. 
Step 2: Get an input buffer of records. If the input 
buffer contains source records, then go to step 5. 
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Step 3: 


step 


Step 5: 


Step 6: 


Step 7s 


Step 8: 


Step 9: 


Ster 10: 


Step 11: 


If the input buffer contains local target records, 
then go to step 6. 
If the input buffer contains the target recoris 
received from the other backends, then go tc step 
sp 
Get the hashing table for the source request. Go 
to step 7. 

Get the hashing table for the target request. 
Store the record intoa bucket and perform the 
bucket-block tracking operation (as described in 
chapter IV). Go to step 9. 

Perform the bucket-tlock tracking operations to 
insert these incoming records into the target 
hashing tatle. 
Repeat steps 2 to 8 until aii records have been 
processed. 

If the input buffer contains local target 
records, then retrieve the local target records 
from the target hashing tabie bucket-by-bucket 
and broadcast them (with tne bucket number) to 


the other tackends. 


If the input buffer contains non-local target 
records, then get the logical address of the 
hashing takle of the source request. Pass the 


logical address of the hashing tables of the 
source request and the target request to the 


nrerging prccedure for the merging operation. 


The SSL for this procedure is provided in Appendix E. 


(3) The Merging Procedure. This procedure does 


three functions: 


(1) fetching the hashing tables of the source reguest and 


the target reguest by their logical addresses which 
have been  prcvided by the bucket-block tracking 
procedure; 
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(2) performing the merging operation on the records of 
both hashing tables (as described in chapter IV); and 
(3) sending the merged results to the controller. 

The merged results contains only the 
attribute-value pairs whose attribute names are specified in 
the target-lists (either the source reguest or the target 
reguest). The extra attribute-value vairs (i.e., the join 
attritutes and their vales, which have been added into the 
target lists by the rarser) are deleted by this procelure. 


Ihe SSL for the merging procedure is provided in Appendix E. 


C. THE MODIFIED MESSAGE-PASSING FACILITIES 


In Chapter II we have introduced the general format and 
the different types cf MBDS messages (see Figure 2.3 and 
Figure 2.8). In order to accomplish the retrieve-ccamon 
request we have added two new message types which are shown 


am Figure 5.1. 


D. EXECUTION OF A RETRIEVE-COMMON REQUEST--VIEWED VIA 
MESSAGE-PASSING 


In this section we describe the seguence of actions for 
executing the retrieve-common request as it moves through 
MBDS. The sequence of actions are described in terms of the 
types of messages passed between the MBDS processes: REQP, 
BMC DMISSSCP and CC. The order in which message are passed 
is denoted alphabetically  (*a' Eo irst). The digit 
following the ordering letter will be the message tyre as 
shown in Figures 2.4 and 5.1. 

The sequence of actions for a retrieve-common request is 
shown in Figure 5.2. First the retrieve-common request comes 
to RECP from the host (al). REQP sends two messages to PP: 
the number of requests in the transaction (b3) and the 


aggregate operator cf the request (c4). The third message 


| 


Explanation : This message is used to notify Directcry 


Management that all of the source 


| 
Message Type : (32) Hashed Target Records » 
Source : Reccrd Processing 
Destination : Reccrd Processing (other backends) | 
| Explanation : This message contains the bucket numbers | 
| Of the target hashing table and all of | 
the target records associated with 
| their buckets. | 
Î Message Type : (33) Source Retrieve Finished | 
| Source : Reccrd Processing 
| Destination : Directory Management (Same backend) | 





reccrds have been retrieved. DM can then 


begin processing the target request. 


yo 





Figure 5.1 The New MBDS Message-Types. 


sent Ey REQP is the parsed traffic unit which goes to DM in 
the backends (d6). CM sends the type-C attributes needed by 
the retrieve-common request to CC (e20). Once an attribute 
is locked and descriptor search can be performed, CC signals 
DM (£26). DM then prccess the source reguest (target request 
is now held). DM perfcrms descriptor search and signals CC 
to release the lock cn that attribute (g23). DM sends the 
descriptor ids for the request to the other backends (h15). 
Ihe DM processes in the other kackends send their descriptor 
ids to the DM process residing in this backend (i15). DM 
then uses its own descriptors and the descriptors received 
from the other backends to form deScriptor-id groups. DM 


now sends the descriptor-id groups for the source request to 
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Fut Pcl 
| ———— 
S34 
U 12 
| EECP 
Figure 5.2 The Sequence of Messages for Executing a 


Retrieve-common Reqgüest. 


17 


CC (j21). Once the descriptor-id groups are locked and 
cluster search can ke performed, CC signals DM (k27). DM 
then performs cluster search and signals CC to release the 
locks on the descriptcr-id groups (m25). Next, DM sends the 
cluster ids for the retrieval to CC (n22). Once the cluster 
ids are locked, and the request can proceed with address 
generation and the rest of the source-reguest execution, CC 
signals DM (028). DM then ¡performs address generaticr and 
sends the source request and the address set to RECP (p16). 
Once the retrieval request has executed properly, RECP sends 
a message to DM to start processing the target request 
(Œ 33): DM processes the target request in the same way of 
processing the source request (i.e., phases e20 to rp16). 
Ihe retrieved records are processed by the hashing module in 
RECP. Once the local target records have been processed 
properly, the  nashing module broadcasts the hashed target 
records (grouped by tucket numbers) to the other backends 
via RECF (s34). The hashing modules in the other tackends 
sends their hashed target reccrds to the hashing module of 
this backend (t34). Once the comparing and merging 
operations performed by the hashing module, the results are 
sent to PP (u2). PP then forwards the results to the host 
(v2). 
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VI. CONCLUSION 


A. REVIEW AND SUMMARY 


The  mzrulti-backend database system (1BDS) in the 
Laboratory for Database System Research at the Naval 
Postgraduate School lS designed to overcome the 
performance-gain and capacity-growth problems of either the 
traditional database system or the 
single-backend-software-database system. The original MBDS 
supported four primary operations, namely, RETRIEVE, DEIETE, 
UPDATE and INSERT. This thesis presented the design and 
implementation of the fifth primary operation, the 
RETRIEVE-COMMON operation. The retrieve-common operation is 
used to merge two files by common attributes. Our major 
goal is to maximize the utilization and minimize the 
arfects to the existing system. 

We have analyzed several possible cesign alternatives 
and then selected the best one for our design and 
implementation approach. The key issues for the selections 
are the cohesion to the design requirements, the design 
issues of MBDS and tke time ccmplexities of implementation. 
Cur design and implementation is based on the bucket-hashing 
approach. zach backend performs partial merge with its 
portion of source records and the entire set of target 
records, sending its results to the controller. The 
contrcller forwards the final results to the user at the 
host conputer. 

Based on the selected design ard inplementation 
approaches, the operations of the retrieve-conmon request 
are executed in four phases, the request-preprccessing 


phase, the record-retrieving phase, the hasning-and-storing 


T9 


phase and the merging phase. The retrieve-common requests 
is first parsed to be a transaction of two retrieval 
requests (each of the retrieve-common type reguest) by the 
parser. Then, the parsed requests are reformated  irto 
required message forrats and broadcasted to all the tackends 
by the ccmposer of the controller. sach backend receives 
the formated messages of the transaction, separates the 
source request and tke target request and then performs the 
directcry operations and retrieves the records according to 
the queries specified in the requests. The retrieved 
records of the source record set and the records of the 
target record set are separately hashed on their common 
attribute values and then stored into buckets of the scurce 
hashing table and the target hashing table, respectively. 
The hashed records of the source buckets and the records of 
the target LEuckets are compared and merged bucket-by-bucket. 
The merged results are sent to the controller from all of 
the backends. The ccntroller then forwards the results to 
the hcst computer. In order to accomplish the operaticns of 
the retrieve-common request, we have designed a hashing 
module into the record-processing process of each backend. 
For integrating cur design into  MBDS, we have made 

several rodifications. These are: 

(1) the message-passing facilities, 

(2) the parser of the request-preparation process of the 

ccntroller, and 
(3) the directory-management process and the 
record-processing process of each backend. 

The algcritàms for the modifications and the pgrcyran 
Specifications (SSL) are also provided in Character IV, V 


and Appendices. 
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B. FUTURE WORK 


The next step in the design and implementaticn cf the 
retrieve-ccmmon operation is the modification of the 4BDS 
software according to the SSL given in the appendices. There 
are two classes cf modifications. First, existing software 
is upcated to reflect the changes necessary for the 
retrieve-common operation. In the system, new message tyres 
must re defined, the request~prepatation and post-processing 
processes of the controller are changed, and the 
directory-management process is Changed to ccrrectly 
sequence and execute the retrieve-common request. Second, 
new software is written to handle the processing of the 
retrieve-common request, i.e., the hashing moduie. In the 
system, the software for the hashirg module is coded tested, 
and integrated into the record-processing process of each 


tackerd. 
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THE MODIFIED REQUEST PREPARATION PROGRAM SPECIFICATIONS 


In this appendix, we present only the modified portions 
of the Request Preparation process. The original SSI is in 
[Ref. 11 : p.87]. 


A. THE LEX MODIFICATIONS 


BC SCACCO AR RA EA AAA kok tk oak ok dk tok RR RARA 
x 
We have added the regular expression for the token * 

* 


/ 
* 
* 
* 
A COMMON into LEX. The rest of LEX remains unchanged. * 
x 
* 
* 


The original specification is in the lsrc file. È 
Ao E CACAO ARCA ak ok ok ok tek ak kK ok ok a kK OK / 


3 (The original lscr specifications.) 


EY { 
return (TOKBY) ; 

} 

COMUN { 
return(TOKCOM); 

} 


"cz ( 
return (LZ); 


} 


a (The original lscr specifications.) 
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B. TBE YACC MODIPICATIONS 


In this section, we present oniy the SSL for the 
modified portion of the parser. The original program is in 
the ysource file. 
procedure yyparse(); 

ARRECARE ROERO RK oR OR KK ERK KOK EEK EEK EK K 

This procedure is used to parse the output of LEX. 
The modificaticn of the yyparse procedure converts 
the retrieve-ccmmon request from a single request 


into a transaction of two requests. 


Data structures and variables used in this 
procedure: 
1. No new data structures are introduced by this 


nodificaticn. 


Boolean variables which are used indicate the 
different conditions of the retrieve common 
request. 

See lew tbi pcr: 


t + + + * 3 & Y ++ HF Ye $ H + X 


A pointer to a request table. 
The request table is defined in the commdata.def* 
file as a REQtbl definition structure. * 
AOL ta trebol, contatti: =» 


* 

* 

* 

* 

* 

& 

* 

* 

* 

* pecon9-f lag" |, comcetilag?2; coUuMTIq90925, conc. flag. 
* 

* 

* 

* 

* 

* 

* 

* 

* Character strings to hold the common attribute. * 
x 


3 HEHE ok aOR Sk kok ok kok doko dokok kok kkk OR RK / 
/* The following is the modified portion of yysource.*/ 


/* Add a new token in the specification. */ 


«token [str] TOKCCM /* common  */ 
/* Add new derivations and program specifications.  */ 
transaction : beg tran lines 


/* No changes in this part  */ 
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/* cf the transaction rule. */ 
| beg single reg line 
if con flag 
then 
/* This is a retrieve-common 
request. */ 
Perform the operations which are 
Specified under the beg tran 
lines; 
else 


/* Perform original operations. */ 


endai f: 
end_reg $ EOR 
/* Clear the com_flags. *7/ 
com_flag = false; 


com_ílag_3 = false; 


reg_forms : delete query 
! © © o 
| .../* These are tne 
original derivations. */ 


| req forms ccmmcn target list req forms; 


ccmmcn LORO 
perform CHECK REQUEST "YPE(req tbl,OX); 
/* Check if the first request is 


a retrieve. */ 


1f CK 
then 
com_flag = com_flag_1 = true; 
else 
perform ERROR _ PROCEDURE; 
end if; 


attribute : LETTERFIRST 
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1f cn flag_? 
then 
/* This attribute is the commen 
attribute of the source 
request. Copy the attribute 
into com_atrb_1. */ 
perfora strcpy (con atrb 1, 
attribute); 
/* Put the common attribute of 
the source request into 
the target list and 
convert the request table from 
the form of single request tc 
the form of a transaction. */ 
perform CONVERT (tdl _ptr->reg_tki, 
con iacr onmi; 
parra) reg cnt, 
rew_tbl_ptr->reg_tbi); 


com_flag_2 = true; 

com Ildg = talise; 

/* com_flag = true */ 
else 


if com_flag_2 
then 
/* This attribute is the 
commor attribute of the 
target request. */ 


com_atrb_2 = strcpy(attritute); 


com_flag_3 = true; 
com_flag_2 = false; 
else 
tf cont flag 3 = true; 
then 


TS 1s the first 


attri Dute of the target 
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list of the target 
request. */ 
insert com_atrb_2 intc the 
target request table; 
insert the attribute into 
the target request table; 

endmyre : 

/* Perform the original 

operations. */ 
end bs 


end; 


retrieve : TOKRETRIEVE 
if ccm flag. 3 
then 
perform ERROR_PROCEDURE; 
else 
if coda flag 
then 
/* Change the type to be 
RETRIEVE COMMON. */ 
end if; 
end if; 


/* Perform the original operations. */ 


delete : CTOINMDEDSTES 
if com flag 
then 
perform ERROE PROCEDURE (); 
else 
/* Perform the original operations. */ 
end r£: 


insert : TOKINSERT 
if ccr lag 
tren 
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perform ERROR_PROCEDURE (); 
else 
/* Perform the original operations, */ 
end if; 


update DO UBEDA 
if ccs flag 
then 
rertorm ERROR_ PROCEDURE (): 
else 
/* Perform the original operations. */ 


end if; 


/* Perform the original operations. */ 


end procedure yyparse; 


procedure CONVERT(ingut: source req table, source com atr, 
traf id, request number, 
index req ptr; 
output: target req table, request numEer, 
index, req ptr); 
JRERER ERO EO ok kok e kk kkk kkk k k k k k k k fo 
This procedure is used to rearrange the contents 
of the request table of a request which is the 
source retrieve of a RETRIEVE COMHON request. 
This procedure performs the following tasxs: 
1. Rearrange the source request table. 
2. Make the common attribute of the source request 
the first attribute of the target list. 
3. Create a request table for the target request 


and returr it to the calling procedure. 


Data structures and variables used in this 
procedure are: 


1. source reg table, target req table: 


+ + * Y + $ Y $ 4 %* $ + + 


The request tables of the source request and 
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* 


A 


+ + * + + + + 4 3 


X + 


+ 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
+ 
* 
* 
x 
* 


/ 


the target request. 

2. new table: 
An array of Regtbl definition structures. 

Se tra Ride 
A character string which Usethe tratitem dso. 
a transaction. 

4. request number: 
An integer which is used to indicate the 
number of requests in a traffic unit. 

5. index reg ptg: 
A pointer to a parsed traffic unit, «which is 
an array of Reqtbl definition structures. 

6. source ccm atr: 
À character string which is the common 


attribute of the scurce request. 


j£. + + * i # + # H+ 0o 003 * 


+ + + 


3k o eK NR RA RR A RA eK eK KK AA RR Y 


* Use a new request table, new_table to hold the 
contents of tle source req table, */ 

new_table¡ 0] Bene 

new table[1] 

new_tablef 2] 

new_table; 3 } 

new_table{ 4] 


str” to nun (trate 


request number; 


rcuttvpe; /* Defined in yyparse().*/ 

RETRIEVE COMMON; 

/* Copy the contents of the source request table into 
the new table. */ 


l SEE 
repeat 
new table[i] -» source req table[i]; 
i = i*1; 
until source reg_table[i] = EOQ; 
/* Insert the common attribute into the new table. */ 
new tableji] = scurce com atr; 
1 = itl; 


/* Ccpy the rest of the source_req_taole into 
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the new table. */ 
repeat 
new table[i)] - source req table(i-1)]; 
i = itl; 
until source req table[i-1] = null; 
/* Fut an end-of-request marker, EOR, 
into the new table. */ 
new table[i] = FOR; 


/* Copy the new tabie into the source req table. */ 


qo s 

repeat 
Source req table(i] - new table[i]; 
i = itl; 

until source_reg_tablefi] = EOR; 


/* Increase the request number, and create a request 
table for the target request. */ 

request number - request numbert1; 

perform ALLOCATE REQ TABLE(target req table); 

/* Put the target req table irto the 
parsed traffic unit. */ 

index reg ptr-»reqg tbl[request number-1] 

- target req table; 

/* Feturn the request number, target req table and 

index req ptr to the calling procedure. */ 
end procedure CONVERT; 
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procedure CHECK_REQUEST RIPETE, Eee deo Ok 
Jk GCI ok teo Xoteoc dete RARE OE e ACE eee Ae e Am 
* This procedure is used to check the syntax of a £ 
* retrieve ccmmon request. If the request type is * 
* not retrieve, set OK to false. Otherwise, set OK * 
* to true. Return OK to the calling procedure. * 
xc xke IG e ee e i a lok gid e e eee e kk bok / 


end procedure CHECK, REQUEST TYPE; 


procedure ERROR PROCELDURE(); 


SFE EI RH HK HK RB Doc oco cole coc oe oc oc oc occ oc e coc oc ole e e a xx 


* This procedure is used whenever there is a syntax * 


* error in the request. * 
* This procedure will print an error message and * 
* terminate the parser operations. x 


ic Rc cc e e e A e e e AA A e He ee He he ok ee Ceo oO ee eK KK 


end procedure ERROR_EROCEDURE; 
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APPENDIX DB 
THE MODIFIED DIRECTCRY MANAGEMENT PROGRAM SPECIFICATICNS 


The original SSL for the Directory Management process is 
in [Ref. 13 : p. 82-102]. In this appendix, we present only 
those procedures which are affected by the retrieve-ccnmon 


request. 


procedure DM ParesedTrafUnit () ; 


J ERR EH HH KH HK KH HK HK He ee eK KK KK KK KK KKK KK EK 


* This procedure is used when Request Preparaticn 
*  (REQP) sends a traffic unit to Directory 
* Management (DM). The original procedure is in 
* the tu.c file. 

* We add an if statement to differentiate between 
* the retrieve-ccmmon request tvpe and the other 


* request types. 


+ + + + + + Y + 


* No new variables are introduced in this procedure. 


3k ck A RR A RN A A A RN O RARA A AY 


/* Get a pointer to the parsed traffic unit. */ 

ti ptr = DM_R$ParsedTrafUnit (); 

/* Get a pointer to the record template 
o this traffic unit: *7 

trpl_ptr = get_tmpfpl_ptr (ti_ptr->ti_dbid); 

/* Get a pointer to the attribute table. */ 

ele AT. l'ookuptbl(ti_ ptr->tI"*0bid); 

/* Get the type-c attributes for the traffic unit 
and send them to DS CC. */ 

perform DM TypeC Attrs TrafUnit(); 

/* Process the requests of this traffic unit. */ 
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ri ptr - ti ptr -5 tirs tEECg DENEN, 
/* Get the type cf the first request of 
this traffic unit. *7 
if req type - RETRIEVE COMMON 
then 
/* Te will crly process the source request. */ 
/* The target request will not be processed */ 
/* until the record-processing process has  */ 
/* retrieved all of the source records. * / 
/* Perform the descriptor search processing. */ 
done = NINS SR DESC(S5rie, ri ptr, tmpl ptr, AT); 
if done 
then 
/* Broadcast the descriptor ids to the 
other backends. */ 
DM Broadcast DIDs(&rid):; 
end if; 
else 
/* This is nct a retrieve-common transaction, so 
process the requests of the traffic unit 
one-by-one. */ 
end if; 


end procedure DM ParesedTrafUnit; 


procedure DM PRecP Msg () 
AA RR oe aiii fila a dio RA o 


* This procedure is used when there is a message 
for DM from RECP (in the same backend). 


We add a new message type to indicate that all 


cf the source records have been retrieved. 


* No new data structures or variables are used. 


* + + Y + + $ 3 


* The original procedure is calied by 


22 


AN bue I. BE NSC) and is in the dirman.c file. x 


HARKER RR RK RR IO RRR ROR RK OK RK AK RK KK tee x / 


/* Get the message type. */ 
MsgType - DM R$Tyre; 
switch (MsgType) 
Case OldNewValue: 
rerfíorm DM OldNewValues(); 
case UpdFinished: 
perform DM UpdFinishedY(); 
Case Source firished: 
/* This is the message which indicates the 
completion of the retrieval of all the 
Source records. */ 
perform DM Source finished (msg); 
end switch; 


end procedure DM RecFE Msg; 


procedure DM_Source_finished (input: message); 

J RIE KKK KK KR OR RR KK III RO kK AO AA AI RISI dk & 
This procedure is used when DM receives a messages, * 

from RECP, which indacates the conpletion of the + 
retrieval of all of the source records. DM is now E 

x 

* 

* 


* 

x 

* 

* ready to process the target request. 

> 

* This procedure is called by DM_Recp_msg(). 
x 


Fe OK I KK o KK ee eK KKK KK A A A AAA Y 


/* Receive the request id from the message. */ 

perform DM R$Rid (source req id); 

/* Get a pointer to the traf info entry by the 
source req id.*/ 


ti ctr = DM_TiFind(source req 1d); 


/* Get a pointer to the reg info entry for the source 


request. */ 


2 


source redj info ptr = DMVRiFind (reg Td” ti ptr); 


/* Get a pointer to the req info entry for the target 
request by the source req info ptr. */ 


target ri ptr - scurce reg info ptr-»5next req info; 


/* Get the request id of the target request. */ 


target reg id - Find request id(target ri ptr); 


/* Perform the directory operations on the 
target request.*/ 
/* Get the record template for the target request. */ 
tupl ptr = get_tepl ptr (ti per ote tea): 
/* Get a pointer to the attribute table. */ 
AT = AT lookuptb2 (ti) ptr -—> tide ce, 
/* Perform the descriptor search processing. */ 
dcne = NINS_SR_ DESC(Grid) Tri ptr, tept ptr; ATJ; 
if done 
then 
/* Broadcast the descriptor ids to the other 
backends. */ 
perform DM_RBroadcast_DIDs(&rid) ; 
end; 


end procedure DM_Source_finished; 
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APPENDIX C 
THE MCDIFIED RECORD PROCESSING PROGRAM SPECIFICATIONS 


In this part of the appendix, we have added the 
retrieve-common subfunction into the control function cí the 
physical-data-operaticn subprocess of the record-processing 
process (RECP). We have presented only the modified portion 
of the original RECP in this appendix. 


procedure ReqProcessing (input: MsgType) ; 
oh OR RO oR aloja OR kk ok fk kk kkk Rk a kk ek 


* 


This procedure is used to process requests according 
tc the request type. 


We add the retrieve-common request type into the 


* 
* * 
x% * 
x * 
x% * 
* * 
* * 
si Switch statements as one of the optional cases. 
A This procedure is called ry the procedure RP DM. The 
y criginal procedure is in the reproc.c file. - 
* / 


x sik oie e ck oie e ic ic ic fc ic n e e e fc ii nei n k k kk kkk kkk kk k A A KK KK 


/* Get the request type. */ 
switch (request type) 
RETRIEVE COMHON: 
perform ST RetDel(); 
/* From this point, we ues the same 
procedures as used for the 
RETRIEVE request processing. */ 
/* Now, back to the original ReqProcessing(). */ 


end procedure RegProcessing; 
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procedure RP_ReadConpleted () ; 
A RE REA AAA Hc e RO gk kk tok gk aki fo ak 

This procedure is used when a physical read is 

ccmpleted. We add the retrieve-common request 

type into its switch statements as one of the 

the request tyres cases. 

This procedure is called by the procedure RP RP. 


The original rrocedure is in the recproc.c file. 


th Ht tb 4B th He th Ht ae 3B GR db 3 
N dÙ de 36 36 36 38 38 36 30 36 36 36 3 


Jalil alioli iaa lalalala lili lolo illo do RA io 
/* Get the request type of this request. */ 


switch (request type) 
RETRIZVE COMMON : 
perform 5C Ret(); 
RECRICENE: 
perform EC_Ret(); 
/* Now, back to the original processing. */ 
end switch; 


end procedure RP ReadConpleted; 


procedure RB$SEND COMPLETION (input: RB ptr, regtype); 
A RA RAR AA ACHE AOC AC ARCA AOC CARO Ae kkk dk kk kkk RA O 
* This procedure does the following tasks: 
1. Send the contents of the result buffer to 
either the hashing module or the controiler, 


depending on the request type. 


+ + e + 


2. If this is a source request of a retrieve- 
common request, then send a message to DM 
indicating that all of the source reccrds 
have been retrieved. 

3. Send a message to CC to release the locks on 
the database for this request. 

4. Free the result buffer space after the 


contents of the result buffer have beer sent. 


+ + + + * * €» + $ I XL. 


+ + eL 4% + % + +» 


SIE 


EC Ret(). 


+ + + + + 


same as the original procedure. 
This procedure is called by the procedure 


The original procedure is in the recproc.c file. 


Aedo ias treucturessans variables are the? + 


- 
* 
a 
* 


Ax sie c cte e oo oce ee ee ee ete ee ok oe c oto e RA OR RARAS 


/* Get the request id by the result buffer pointer 


RB ptr.*/ 


request id - RB ptr-»R8B.^ 


Pid: 


if reqtype = RETRIEVE COMMCN 


then 


if the result_buffer is full 


then 


/* Send the contents of the result buffer */ 


/* to the hashing module and reinitialize */ 


/* the ruffer 


size to 0. A 


EASH FUNC(request id, result, result length); 


result length 


end if; 


= 0; 


if this is the last result buffer 


for this request 


then 


/* Send the result buffer to the 


hashing module. */ 


perform HASH. 


if this is a 
then 

/* Send 

/* that 

/* have 

perform 


end if; 


FUNC (reguest_id, result, 
result iength); 


Source request 


a message to DM indicating */ 
all of the source records SA 
been retrieved. m 
DM FinReq$RP S(request id); 


/* Free the result buffer space. */ 
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perform Recp_iîree (request _ id); 
/* Send a message to CC to VA 
/* release the locks for this L^ 
/* request. */ 
perform CC FinReq$RP S(request id); 
end if; 
else 
/* This request is not a retrieve-conmon 
request. 
Now, back to the original processing. */ 
end if; 
end procedure RB$SENL COMPLETICN; 


procedure XTRACT (input: TRACK BUFFER, indexB, result2, 
request, tmpil ptr, target ptr; 
output: result2); 
J RRR RK RR KK RR KK ROR KK KR RK ORK KK BOK BOK OE EK 


The original procedure is in the rbabs.c file. 


* We add an end-of-record marker, FOR, at the end 


* This procedure extracts the attribute nanes and * 
* values which correspondend tc the target list * 
* cf a record. * 
* This procedure is called by the procedure * 
* $RETR_PROCESSING(). * 
* * 

* 

* 


* of every reccrd. 
X ce oet alo fal alla E OC kkk tok ol dodo x / 


/* Process all statements of the original procedure 
until the end cf the outermost while loop. */ 
/* Add the following processing. */ 
if the reqtype - RETRIEVE COMMON 
then 
put the EORecord marker into the result buffer; 
end if; 


/* Now, back to the original processing. */ 


EB 


STaepLrocedure X TRACT: 


procedure RBSPUT_SEND (input: RESULT _BUFFER, result, 


length_of_result); 

J RRR RK RRR RR ROR OER RK RE EK 

* This procedure puts the results for a request 

* into the result buffer. If the result buffer is 

* full, then the contents of the buffer are sent to 

* the controller or the hashing module and the 

* length of the buffer is set to 0. 

* This procedure is called by the procedure 

* RETR PROCESSING(). 

* 

* 


The original procedure is in the rbabs.c file. 


HH + + + + + + * 


AAA AA AN A A RR A A AR RAR Y 


if the result buffer is full 
tren 
/* Find the request type in the result buffer.*/ 
reqtype = FIND req type(result buffer); 
if reqtype - RETRIEVE COMMON 
then 
/* Send the results to hashing module. */ 
perform HASH FUNC(result buffer); 
else 
/* Send the results to the controlier. */ 
perform RESSCNTL$RP S(request id,results, 
length of result); 
end if; 
length of result - 0; 
else 
/* Store the results into the result buffer. */ 
/* Now, back to the original processing. */ 


ara if: 


end procedure RBSPUT_SEND; 
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procedure RP CNL ANOTHER BE MSG(); 
J HARKER HK A ecc oe o eo elc eo oe o coe Cc ce e e KKK KK KKK KK KX AAA 


a 


+ + + % 3% 3% 


The purpose of this procedure is to process * 
the messages received from the controller or * 
the other backends. k 
This procedure is modified for processing the * 


the hashed information of the non-local target * 


records. 


* 


The original procedure is in the reproc.c file, * 
DIIIIIILELIIIIIEIIIIIIIIIEIERIIIEIEIIIIILIIIIIIEIIIIIIIO, 


/* Get the 


message type. */ 


perform MsgType = Type$RP_R; 


case MsgType of 


Bucket info: 


/* 
/* 


This message is the hashed information  */ 


for the non-local target records. */ 


perform PROCESS BE TARGET(); 


/* This procedure should return the sender, */ 
/* the reguest_id of the target reguest */ 
/* and whether or not this is the last S. 
/* message from this backend. SA 
/* Check to see if all tne target records  */ 
/* of all the other backends have been */ 
/* received. */ 
if LAST_MSG 

then 

perform CHECK RECEIVE MSG(sender, 
request id, ALL RECEIVED); 

engem f > 
if ALL RECEIVED 


then 
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perform START TO MERGE(request id); 
/* The called routine will perform SA 


/* the merging operation and send the */ 


/* results to tne controller. A 
endo 
7% Now, back to the original processing. * / 


end case; 


end procedure RP CNL ANOTHER BE MSG; 


procedure PROCESS BE TARGET (input: message; 
output: sender, request id 
LAST RECORD); 
Y RS lola aia RS do ARRE RR AA ACACIA RR RR RR 
* This procedure is called to process the message 
which contains the hashed bucket information of 
the non-local target records. 
This procedure will return the sender of the 
nessage, the request id of those non-local 
records and a boolean variable, LAST RECORD, to 
indicate that all of the target records from the 


sending backend have been received. 
Data structures and variables used in this 


* 

* 

* 

* 

* 

* 

* 

* 

+ 

* procedure are: 
* 1. LAST_RECCRD: A boolean variable which is 
* used to indicate the end of 
* this request. 

* 2. message: A character string which is used 
* to store the hashed results of 

* target records anda is sent from 
* the other backends. 

* 


K cte icol ol KKK tee RK KK RK EK KK KKK KKK KKK KK RK KK KKK KEE KKK EK / 


/* Get the sender of the message. */ 
perform GET MSG SENDER(sender); 
/* Get the request id of the request. */ 
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perform GET REQUEST ID(request id); 
/* Now, check the global table to find the address */ 
/* of the hashing table for this request. */ 
perform CHECK GLCEAL TABLE (request id, hash table, 
NEW REQUEST); 
NEW BECORD - true; 
/* Since the message is an array of characters, * / 
/* we have to bypass the header to get the record t/ 


/* information. If this message is the last message */ 


/* of the sending backend, then there will be ar LA 
/* end-of-request marker, FOReguest, in the front */ 
/* of the end-of-message marker. */ 


I = the_integer_which_stands for 

the index where record start; 
/* Gets the bucket numbers and their associated */ 
/* records from the message, then insert them into */ 


/* correct buckets of the nashing table. & 


while ((not end cf message) or (not end of request)) do 
perform GET BUCKET NUMBER (message, i, Dbucket_ value); 
/* Get the bucket number of the record and the mf 
/* record itself from the message, and then * / 
/* store the record into tne appropriate bucket */ 
/* of the hashing table Fy using tke */ 
/* bucket number. */ 
perform GET A RECORD. SET (message,T,set) ; 
perform STORE RECORD IN BASH TABLE (hash table, 
bucket number, set, NZW RECORD); 
NEW RECORD = false; 
end while; 
if EORequest 
then LAST RECORD true: 
else LAST RECORD = false; 
end if; 
end procedure PROCESS_BE_TARGET; 
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procedure START_TO_MFFGE(input: request id); 
J RRR RR REE RR OR OE AIN RR RAC AR e e k k k 

* This procedure is called when the target record * 

set has been received from all of the other 


backends. 


+ Y + 


The input request_id is the request id of the 
target request. 
The data structures and the variables used in 
this procedure are: 
1. TARGET_TAFLE : The hashing table for the 
target request. 
2. SOURCE TAELE : The hashing table for the 
Source request. 
3. target id: The request id of the target 


request. 


*% + e $ + + + + 38 + % 4 + 


4. source_id: The request id of the source 


* + Y Y + * $ # 4 Y + 


* request. 
HK E e e e e e e e e e ae e Re e te KK e i e e k e ee ic k kk k k kk kok k k k kkk k kk 


target id = reguest_id; 

/* Get the source request id. */ 

perfcrm GET_SOURCE_ID(target_id, source_ 1d); 

/* Get the hashing table of the source request. 

pertorm CHECK GLOFAL TABLE [source id, global table 
source_hasi_table, 
NEWNSRBOUEST); 

/* Get the hashing table of the target request. * / 

perform CHECK GLOBAL TABLE (target id, global table 
target hash table, 
NEW_REQUEST); 

/* Merge the reccrds of these two requests and send */ 

/* the results to the controller. */ 


perform MERGE(source id, source, hash table.address 


11918; 


target_hash _tatle. address); 
end procedure START TIC, MERGE; 


procedure GET SOURCE ID(input: request 1d; 
output:request rd); 
yaaa lalalala otetetetoteteteietetokekeoletetotokoteteteteteole ke deteteotetetetekedeketeteteokee eie 
* This procedure is used to find the request id for * 
* the source request by using the request id of the * 
* target request. * 
* Recall that the source request and the target È 
* request has the same traffic id, the difference * 
* between them is that the reguest number of the * 
* source request is less than that of target * 
* request by 1. È 


X kl NR ORO e A A oc RR RR RR RARA AY 


end procedure GET SOURCE ID; 
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procedure CHECK RECEIVE MSG(input: sender, reguest_id; 


OGUPDUIUCIOAEL RECEIVED); 
AREA ARA A He e e e e RK KK KK KH e ac ae oe ol oc ole oe oc oec o oe oe oe oe oec oce deo oe oe oe dee ceo x 


* This procedure is used to check whether all * 

* of the non-local target records have been * 
* retrieved from all of the other backends for E 
* a particular request. If all of the non-local * 
* target records have been received, then * 
* AIL RECEIVED is set to true. Otherwise, = 
* AIL RECEIVED is set to false. * 


A oe RK RK RK e RR RRA RARA y 


end procedure CHECK RECEIVE MSG; 


procedure CHECK_GLOBAI_TABLE (input:reguest_id; 
output: hash_table, 
NEW_REQUEST) ; 

A Re BREE Re e oe oe o c oko oe decl be tek ARIE MARRONE RR RE ROIO KE RK 
* This procedure is used to check whether a request * 
* is a new request by checking if the request id is * 
* jin the global table. If the id is found, then set * 
* the value of NEW REQUEST to false and return the * 
* NEW VALUE and the hash table of of the request. * 
* This procedure has been defined in HASH FUNC(). * 
* / 


HH Ae HR He he ee eK ee ee e ee eK KK KKK KK KK KK KKK KK 


end procedure CHECK GIOBAL TABLE; 
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procedure GET BUCKET NUMBER(input: message, index; 
output: index, bucket number); 
J RRR kkk dk ke ke tek ak tok 
This procedure is used to extract the bucket 
numbers from the message, then return the 
rkucket_number and the incremented index to its 


caller. 


procedure: 
1. bucket: A character string representation 


of the bucket number. 


* * € Y € Y + + F 


x 
x 
E 
* 
* Data structures and variables used in this 
x 
* 
* 
* 2. j: A general purpose index. 

4 


2 ali a kk kk tok kkk kk AA kkk 
j Sao. 
repeat 
bucket[j] = message[ index]; 
index = index+t 1; 
2 RAS 
until message,i] = EOV; 
perform STRING_TC_INTEGER(tucket, bucket number); 
end procedure GET_BUCKET NUMBER; 


10 € 


procedure GET_A_RECORD SET(input: message, I; 
Out Puts See, 
Y RRA A al k k lll ol RK KEK RK RK 
* This procedure is used to extract the conmon * 
attribute value of a record and the record itself* 
from the message which contains the hashed bucket* 


information of the non-local target records. 


The data structures and the variables used in 
this procedure are: 
1. set: A array which contains the common 
attribute value of a record and tne 
record itself. 


Y do +) ACA A NM de Y + 


+ + + + * dt + +“ 


2. j: A general purpose index. 
Al IA ok ek ok eK KR RK RK KK KKK RY 
J = 0; 
repeat 
sSet[J] » message[I]; 
I = I+1; 
o7 Jt1; 
until message[ I-1] = EORecord; 
end procedure GET_A_RECORD SET; 


Procedure HASH_FUNCTICN(input: request_id, result, length; 


APPENDIX D 


THE HASHING PROCEDURE PROGRAM SPECIFICATIONS 


output: request id, hashed result, 


length hashed result); 


Jc ecc HK KKK KKB EK KK RK KH KK KK KKK KK KK KK RK KK KK KH 
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The purpose of this procedure is to hash the value 


of the join attribute into a bucket of the hash 
table. 
A hash buffer is reserved to store the hashed 


results. 


Data structures and variables used in this 


procedure are: 


Ue 


hash_buffer: A variable of the data type 
hashing_buffer which is used 
to stored the records and their 
hashed bucket values, and is 
defined in hashing_module.def. 
RP_rid_info: The information Zor a request. 
This structure is defined in 
the commdata.def file. 
RB? ridEptr: A pointer to the data structure 
of type RP_rid_info. 
req tbl ptr: A pointer to a request table. 
The request table is defined in 
the commdata.def file as a 
REQtbl definition structure. 
temp entry: A variable of data type rt ntry 
which is defined in commdata.def. 


tem ptr: A pointer to temp entry. 


rt enrty: A pointer to a field of RP rid info. 


The type o£ this field is rt ntry. 
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* 
* 
* 
* 
* 
* 
x 
+ 
* 
* 
x 
* 
* 
* 
ok 
- 
* 
* 
* 
* 
* 
* 
* 
& 


x 


* 


X oc oko IOC oko AAA ARA AAA RR OO RR RIO RA RARA RA AR RARA RR] 


/* Check if the request id is a new request. “i 
if new request 
tken 
/* Get the record template to find the value 7 
/* type {i.e., integer, string or float) of the */ 
/* common attribute value. */ 
perform FIND_RP_rid_info (reguest_id,RP_rid_rptr); 
/* Get a pointer to the reguest table fron the */ 
ARPA into. "*/ 
req tbl Ptr -ERP Cid ptr -> RP_ri req; 
/* Find the attribute name from 
the request table. */ 
perform FIND COMMON AITRIBUTE(req tbl ptr, 
attribute name); 
/* Get a pointer to the entry * 
/* of the terplate for the common attribute. */ 
ten Ptr = RE cld PEE > RP r1 tapgl ptr -> rt_ entry; 
/* Get the value type of the common attribute */ 
/* from the record template. A 
if tem_ptr->tenp_entry. value data type = 's' 
then 


value type - string; 


else 

/* If the value type is integer, then */ 

/* we decide which hashing function to 17 

/* use. SA 

MAX = ten ptr. value c1; /* The possible x 
/* maximum value */ 
/* for this RA 
/* attribute. */ 

MIN = tem ptr.value c2; /* The possible */ 
/* minimum value */ 
/* for this * / 
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/* attmebDuates */ 
if (MAX-MIN) «€ the number of buckets 
then 
value type - small integer 
else 
range = (MAX-MIN) / the number of Luckets; 
value type - large integer; 
end if; 
end if; 
end if; 
/* Allocate a buffer to store the hashed results. */ 
perform ALLOCATE HASH BUFFER(Hash buffer); 
/* Note: we may not want to call this */ 
/* routine at this point. tA 
switch (value tyre) 
Case string: 
perform STRING HASH (result, 
hash buffer); 
case small integer: 
perform SMALI INTEGER, BASH(result, MIN 
hash buffer); 
Case large integer: 
perform LARGE INTEGER HASH(result, MIN, 
range, 
hash buffer); 
end switch; 
end procedure HASH_FUNC: 


procedure FIND COMMON ATTRIBUTE(input: request table; 


output: attribute name); 
J FREER RE e e e e oe do e o REE EK KEK ete e tete de e ace ice ee k k k k k 


* This procedure is used to find the name of the $ 
zin attribute. B 
The join attribute is the first attribute of the * 


target list, sc we can just go to the entry x 


attribute name and then return it to the calling * 


* 
* 
* 
* where the target list begins and extract the first* 
* 
* procedure. * 
* 


co ok kk a dd ok o Y 


end procedure FIND_CCMMON_ATTRIBUTE; 


procedure ALLOCATE BUFFER (input: reguest_id; 
output:hash buffer); 
Saad locali lala CR RR RR A RAR MO Re e] 


/* This procedure is used to allocate a buffer for */ 


/* storing the records and their hashed bucket number,*/ 


/* set the length of the buffer to 0, and then E 
/* return the buffer to the calling procedure. */ 
Zur */ 
/* The data structures and the variables used in */ 
/* this procedure are: */ 
/* 1. hash buffer: <A 
/* A variable of the data type hashing buffer,  */ 
/* which is defined in hashing module.ief t/ 
/* (see Appendix G). X 
/* DONE UDEE LI 7 
/* A pointer to the hash buffer. */ 


/* Jo Beid; */ 


/* A field name of the hash buffer that */ 
/* contains the request id of the records ai 
/* whicn beicng to this buffer. SA 
A AEREA AA AO E E E A e AA A OR RK OR RRR KEK 
HE_ptr = allocate the hash buffer; 
HE_ptr->HB_id - request id; 
HE ptr->length = 0; 

end procedure ALLOCATE_BUFFER; 


procedure STRING_HASH (input: result buffer, h_buffer); 


A/R RARA ACHE AE CECA k k k k k kk k RK RR KR RK KE 


* This procedure is called when the value type * 
* of the common attribute is a character string. * 
* Tt performs tte following tasks: * 
* 1. Extract records from the input result buffer * 
+ one at a time. * 
* 2. Extract tłe value of the join attribute * 
X from tne extracted record and then check the * 
* lookup takle to get the bucket number for * 
* the record. * 
* 3. Store the bucket number and the record into z 
* a reserved hash buffer, h_burfer. * 
* 4. If the hash buffer is full, tnen send the * 
+ hash buffer to Bucket-block tracking * 
* procedure. io 
* * 
* Data structures and variables used in this x 
* rrocegurcoare:? s 
+ 1. attribute_value: A character-string * 
* representation of tne common * 
* attribute value. * 
* 2. record: A character-string representation È 
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* of the extracted record. * 





* 3. bucket_nunter: The bucket number where the * 
: record characterized by the * 
* common attribute value is * 
d hashed into. x 
* 4. bucket: A character-string representation * 
* cf the bucket number. * 
* 5. EOV: The end-of-value marker. 
* 6. EON: The end-of-name marker. * 
* J.e EOB: The end-of-buffer marker. * 
* 8. LAST RECORD: A boolean variable to indicate * 
* that this record is the last * 
* record for the request. * 
* 9. 1: The index for the length of the result * 
+ buffer. P 
* j: A general purpose index. * 
* 10. lookup: The lookup table, which is an array * 
* with 2048 character-striny elements. * 
Rw ee * 
* | 0 | abal $ 
E [|------------------ | * 
t | 1 | abc | + 
* ]------------------ | * 
* | : | $ 
* wc m c  - | * 
* | 2047 | zyth | * 
A ea ee ee = == == * 
* 

* 11. h_buffer: A variable of type hash_buffer * 
* which is defined in * 
* hashing module.def (see Appendix G)* 
* and is used to store records and È 
* their hashed values. s 


He He He o oc oleo A A A A A c dco oc c de o A A AAA AY 
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/* Get the lookup table. 
i = 1; 
Jom 
LAST RECORD = false; 
/* Get records frcm the result buffer one at a tine. 
while result_buffer[i] <> FOB do 
/* Bypass the name of the common attribute. 
while result buffer[i] «4» EON do 
i = itl; 
end while; /* Ncw, result buffer[i] - EON. */ 
i = itt; 
/* Get the value of the join attribute. */ 
While result buffer[i] <> EOV do 
attribute value[j] -» result bufrer[i]; 
1 = itl; 
j= Or; 
end while; /* Ncw, result_buffer[i] = EOV. x / 
/* Compare the common attribute value with 
/* the contents of the lookup table to get the 
/* bucket-number. 


£ 


+7 


a 


#7 
EZ 
ES 


bucket numbers - BI SEARCH(lookup, attribute number); 


perform NUMBER TO STRING (bucket number, bucket); 
/* Add a EOV marker to the end of 
the attribute value. */ 
attribute value[j] = EOV 
/* Extract records from the buffer. */ 


i = i+i; 
j = 0; 
repeat 


record[j] » result buffer(i]; 

i = i+1; 

j = jei 
until result_buffer[i-1] = EORecord; 
/* New, record], j] = EORecord. */ 
if result_buffíer(i] = EORequest 
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then 
LAST_RECORE = true; 
i = iti; 
end if; 
/* Store the hashed information into the 
hash buffer, h buffer.  */ 
perform PUTSHASHSSWBEER(b9buffer,9*bucket, 
attribute value, record, 
LAST RECORD); 
end while; 
end procedure STRING HASH; 


Erocedure PUT HASH BUIFER(input: h buffer, 
bucket 
attribute value, record, 
LAST RECORD; 
output: h buffer); 
Joke ole cd RCA NOOO tek koe RA ORO RO RA 
* This procedure is used to store the hasned 


record information into the hash_buffer. 


Tata structures and variables used in this 
procedure are: 
1. X,Y,2,1,Jj,K: General purpose indexes. 
2. MAX: The predefined maximum length of the 
hash buffer. 
3. bucket: A character-string representation 
cf bucket nunber. 
4. record: The input record which is in the 
form of character string. 
5. LAST RECCRD: A boolean variable wkich is 


+ + + e + Y + % Y + + # 
H 4 + + 44 dE dt + + + + + + 
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used to indicate the end of 
this request. 


6. h_buffer: A buffer which is used to store 


+ + + + 


records and their hashed values. 


ROCIO ICO CAO A SC EE EE SO i kiko ok tok kok dott ok / 


/* Check to see if the buffer has enough space for */ 


/* the new record. */ 


nm 
vi = 
4 = 
Hoz 


String len (bucket nunmnber); 
String len (attribute value); 
String len (record); 


the current length of the hash buffer; 


if (K * Xe Y £2) MIX 


then 


/* The buffer is full, so it is send to the #/ 
/* bucket-blcck tracking procedure. */ 
perform BUCKET BLOCK(h buffer); 

/* Reset the length of the buffer to 0. */ 

K= 0; 


else 


/* The buffer has enough space, so store the */ 

/* input record into the buífer.*/ 

for ií = 1 tc X do 
K=K+ 1; 
hash result(K] 

end for; 

for 1 = 1 to Y do 
K = K + 1; 
hash_result[K]= attribute vaiue[i]; 

end for; 

for i = 1 to Z do 
K= K+ 1; 


hash_result{K]= record[i]; 


bucket[i]; 


end for; 


/* If this is the last record of this request,  */ 
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/* then send the hash buffer to the DA 
/* bucket_blcck tracking procedure. nu 
if LAST RECOED 
then 

hash_result[K+1] = EORequest; 

hash result[K*2] EOB; 

perform BUCKET BIOCK(h_buffer): 

per orm Ane PPUFEFERTSPACE(h_tuffer); 
end if; 


end if; 
end; 
end procedure PUT_HASH BUFFER; 


procedure SMALL_INTECER_HASH(input: result buffer, 
MIN, 
h_buffer; 
output:h buffer); 
Y aaa al laa lalala] lalalala ok oo k k kk k 


* This procedure is used when the type of the + 
* ccnmon attribute value is integer and when the * 
* difference of the maximum and minimum value of * 
* the common attribute value is less tban the * 
* number of the kuckets of the hashing table. ni 
* Tt perforus the foilowing tasks: x 
* 1. Extract records from the input result buffer * 
* one at a time. * 
* 2. Extract the value of the common attribute frcm* 
* the extracted record and then calculate * 
* the bucket number. * 
* 3. Store the bucket number and the record into * 
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a reserved hash-buffer. 
Data structures and variables used in this 
procedure are: 

1. attribute value: AÀ character-strinj 
representation of the common 
attribute value. 

2. record: A character-string representation 

cf the extracted record. 

3. bucket number: The tucket number where the 
record characterized by the 
commcn attribute value is 
hashed into. 

4. bucket: A character-string representation 

cf the bucket_ number. 

5. EOV: The end-of-value marker. 

6. EON: The end-of-name marker. 

]. EOB: The end-of-buffer marker. 

8. LAST RECORI: A boolean variable to indicate 
that this record is the last 
record for the request. 

9. i: The index for the length of the result 

buffer. 

j: A general purpose index. 

k: The index for the length of the attribute_ 
value. 

10. temp: An integer representation of the input 

attribute value. 

11. h buffer: An variable of type hash buffer 

which is defined in 
hashing_module,def (see Appendix G) 


and is used to store records and 


+ + Y Y + AR + 0% dt. dk e >. a£ Y e + + + Y t o d dt H H Y + H dt 4 H X 
+ + + + &© $ Y $ dt dB GR Y Y 1% Y % € dt 4 * dt + e %* +. * % % % %*%e o 3 


their hashed values. 
3 slc o tct tote ole ote E AE lalalala lalalala lalalala k k k k k Ak y 


/* Initialize the indexes. */ 
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i = 1; 
k= 1; 
Ecos 
TAST FECORD = false; 
/* Get the records from the result buffer 
cne at a time. */ 
while result_buffer[i] <> EOB do 
/* Bypass the name of the common attribute. */ 
while result_buffer[i] <> EON do 
i = itt; 
end while; /* Now, result buffer[ij is EON. */ 
ì = Ll 
/* Get the value of the common attribute. */ 
wiüile result _bufferji] <> EOV do 
attribute value[k] » result buffer[i]; 
i = i+1; 
OS 
end while; /%*" Now, result bufífer[13] is zOV. */ 
/* Compute the tucket number. */ 
rerform STRING TO NUMBER (attribute value, Tenp); 
bucket number - Temp - MIN; 
perform NUMBER _ IO STRING (bucket number, bucket); 
/* Add a EOV marker to tke end of attribute value. */ 
attribute_value[]] = EOV 
/* Get the attribute-value pairs of the actual */ 
/* target list of the record. */ 


Jag 
j = 0; 
repeat 


record[ j] » result buffer[i]; 

EE itl; 

js T, 
until result buffer[i-1] = EORecord; 
/* Now, record[j] is EORecord.  */ 


if result buffer[i] » EORequest 
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then 
LAST RECORD = true; 
i = itl; 
end if; 
/* Store the hashed information into the h_buffer. */ 
perform PUT_HASH_BUFFER(h_buffer, bucket, 
attribute_ number, record, 
LAST_RECORD); 
end while; 
end procedure SMALL INTEGER HASH; 


procedure LARGE INTEGEF HASH(input: result buffer, 
MIN, range, 
h buffer; 
output:bash buffer); 
J RHEE RR RRR OR OR SOR RRR RK RO EK RK OK RK EE KK 
* This procedure is used when the type of the 
ccnmon attribute value is integer and wnen the 
difference of the maximum and minimum value of 
the common attribute value is greater than the 
number of the tkuckets of the hashing table. 
It performs the following tasks: 
1. Extract records from the input result buffer 
one at a time. 
2. Extract the value of the conmon attribute from 
the extracted record and then calculate 
the bucket number. 


3. Store the tucket number and the record into 


+» + + + + ++ + Y Ye +» 4 KH # 


a reserved hash-buffer. 
Data structures and Variables used in this 


procedure are: 


+ + 4 + + # * % Y + % + % + A 


1. attribute value: À character-string 


ko + + + 


4t 


representation of the common 
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attribute value. * 


Ze record: A character-string representation aa 


* 


of the extracted record. 


3. bucket numer: The tucket number where the * 


+ t+ + 02 # 


record characterized by the * 

common attribute value is * 
* hashed into. si 
* 4, Lucket: A character-string representation * 
* of the bucket nunber. * 
* t. EOV: The end-of-value marker. * 
+ 6. EON: The end-of-name marker. * 
* 7. EOB: The end-oft-buffer marker. + 
+ €. LAST_RECORD: A boolean variable to indicate * 
* that this record is the last * 
* record for the request. * 
* 9. i: The index for the length of the result * 
È buffer. E 
* j: A general purpose index. * 
E >: The index for the length of the attribute  * 
y value. * 
* 10. temp: An integer representation of the input * 
* attribute_value. * 
CN NIP EDU frere An yarria leor type hash buffer D 
* which is defined in ù 
* hashing_module.lef (see Appendix G) * 
* and is used to stcre records and * 
È their hashed values. y 
eR RR RK RARER RE RRR RR RK RK RK RR KKK REE ARA 


/* Initialize the indexes.  */ 


L= s 
k = 1; 
pe 0; 


LAST RECORD = false; 
/* Get records frcm the result buffer one at a time. */ 
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while result buffer ili E op do 

/* Bypass the name of the common attribute. */ 
while result_buffer[i] <> EON do 

i = i+1; 
end while; /* Now, result buffer[i] is EON. */ 
i = i+1; 
/* Get the value of the join attribute. */ 
while result buffer,.i)] <> EOV do 

attribute value[k] = result buffer[i]; 

i = i+1; 

Jj us 
end while; /* Ncw, result buffer[i] is EOV. */ 
/* Compute the Eucket number.  */ 
perform STRING TO NUMBER (attribute value, Temp); 
bucket value - IRUNC[ (Temp - MIN)/rangeJ; 
perform NUMBER IO STRING (bucket value, bucket); 
/* Add a EOV marker to the end of attribute value. */ 
attribute number[j] - EOV 


/* Get the attribute-value pairs of the actual */ 
/* target list of the record.  */ 

1 = Fie 

Jai 

repeat 


record j] = result Dbuffer[1i]; 

i = i+1; 

IA 
until result buffer[i-1] - EORecord; 
/* Ncw, recordjj] is EORecord. */ 
if result buffer[i] -» EORequest 


then 
LAST RECORD = true; 
i = i+1; 
end if; 


/* Store the hashed information into the h_bufíer. */ 
perform PUT HASH BUFFER(h buffer, bucket, 
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attribute number, record, 
LAST RÉCORD); 
end while; 
end procedure LARGE, INTEGER, HASH; 


12.3 


THE EUCKET-BLOCK-TRACKING PROCEDURE PROGRAM SPECIFICATIONS 


procedure BUCKET_BLOCK(input: R_tuffer) ; 


J RRR HH RH RK KK KH RK RH lc eoe deco e A e e a k k i I k a kk e x 


* This procedure recelves a hash buffer, H_buffer, 

* from the ret ccm subfunction and performs the 

* fcllowing task. 

* 1. Establish and maintain a global table to 
store the addresses of the hashing tables 
of all the requests. 

2. Extract the hashed record information fron 
the input hash_buffer. 

3. Check the global table to see if the input 
records telong to a new request. If they do, 
then allccate a new hashing table. 
Otherwise, get the logical address of the 
hashing table from the global table and 


assign a pointer to the hashing table. 


LANE NE % + + >» H + 3% dé 


4. Group records into the buckets according to 


+ 


their bucket numbers and store them into 
blocks. 


* 


5. Broadcast the bucket information of the local 
target records to the other backends. 
6. Store the hashing table back to the secondary 


x 
* 
x 
* storage. 
* 
* 


Data structures and variables used in this 


* procedure are: 


* 
* 1. FIRST RET COM : 
x A boolean variable which is set to 


* true wken the first retrieve conmon 
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* 
* 
* 
* 
* 
* 
* 
de 
* 
* 
x 
* 
x 
x 
x 
* 
x: 
* 
* 
* 
* 
* 
* 
* 
% 
x 
* 
* 


++ + + +t + 


++ 


++ 


++ 


+ + + + j%+ + 


+ + + + + 


++ 


* + + e % Y 4 3 + % + 


request enters the system. 


Ze GIGDET: 
A pointer to a global table. 

3. G_table: 
A variatle of type glotal table (see 
Appendix G). 


4: HI ptr: 
A pointer to a hashing table. 
Si: 
A variatle of type Hash table (see 
Appendix G). 


6- HB ptr: 
A pointer to a hash buffer. 
i. H_Dbuffer: 
A variatie of type hash burfer (see 


Appendix G). 


©. NEW REQUEST: 
A boolean variable which is set to 
true if the request id cannot be found 
in the global table. 
9. logical, addr: 
A variatle of type addr definition, 
which is defined in the conndata.def file. 
10. bucket number: 
The bucket number where the record 
Characterized by the attribute value is 
hashed into. 
11; bucket: 
A character-strinc representation of 
the bucket numrer. 


2 redq-rds 
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*%o* + + Y Y € 3€ o id + Ye E € o H H H Y FH H HF HF + Ò % 3 


kk x + * + + * d 


* 
* 
* 
* 
* 


pm i, 


A record which contains the"traffic wae ands 


request nunber of a request. 
DL 
General purpose indexes. 


lf FIBSICHETISCOM 
then 


perform 
PAR SHR 


end i: 


INITIALIZE GLOBAL_TABLE(GT_ptr) ; 
ET_COM = false; 


/* Get the reguest id from the pointer of which 


/* pcints the input hash buffer. 


request id 


- H buffer.Request id; 


/* Check the glotal table to see if this request is 


/* a new request. 
perform CHECK GLCEAL TABLZ (GT ptr, req id, 
logical addr NOW ns Sq 


if NEW REQUEST 


then 
perfo 


perfo 


end if: 


perform GE 


/* 
/* 
/* 
VA 
/* 
/* 
/* 
/* 


Now, th 
Extract 
hash bu 
3ecause 
are the 
this is 
and the 


end of 


rm ALLOCATE_HASH_ TABLE (logical_addr) ; 
rm INSERT_GLOBAL_TABLE(GT_ptr, reg_id, 
logical addr); 


T HASHING TABLE(request id, 
logical -addr T); 


e hashing table is ready to store records. 
the record information from the 

ffer ore record at a tine. 
the last two character 3f tne hash buffer 
EORequest marker which indicates whether 
the last hash buffer for this request 
EOBuffer marker which indicates the 

this hash buffer, the actual length of the 
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e 


* 


x 


X Cc RE A RO RR o ode e oc oe o oc A ARR / 


7 
*/ 


*/ 
£7 


*/ 
*/ 
A 
2 
td 
*/ 
14 
7 


/* nash buffer is length-2. a/ 
JS - 
while j < (H buffer.length-2) do 
/* Get the bucket number. */ 
i = 0; 
repeat 
Eucket[1] = H_buffer.Hashed_result[ 3); 
i = i +1; 
Nac di l; 
until H buffer.Hashed resuit[j] - EOV; 
/* Convert the Lucket number fron a character to */ 
/* an integer. */ 
bucket number - STRING TO INTEGZR (bucket); 


/* Get the conmon attribute value and the record * / 
/* itself. */ 


jm 
i = 0; 
repeat 


cclmontand recordi] ~ Hash-buffer.HB burfer[j]; 
O E O 
girata ti: 


until common and record [i - 1] - EORecord; 


/* Store the record and its common attribute value */ 

/* into the hashing table. */ 

rerform STORE RECORD IN HASH TABLE (HT, bucket nunmLer, 
common and reccrdá, 
NSWNERSCORD); 

NEW RECORD - false; 


end while; 


/* Check if this is target request * / 
if MOD(req id.request no, 2) = 0 
then 
/* This is a target request. */ 
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perform BROALCAST PARISI NEON, 


end if; 
perform STORE BACK(HT, logical, addr) 
end procedure BUCKET_ELOCK; 


procedure INITIALIZE GLOBAL TABLE(output: GT ptr); 


A PERI RIE RI RR AC ART HE HER HE RC AR RA He AR HR RR oc o oic oc oe oc oco oc oic oe ote AR RARA 


* 


+ 


x 
* 
* 
x 
* 


This procedure is used when the first retrieve- * 
ccmmon request is executed in the BUCKET BLOCK * 
Irocedur e. 

This procedure creates a global table and 
returns the pcinter (GT ptr) to the table to 


+ + Y 3 


the calling procedure. 


ARRAY 


end procedure INITIAIIZE_GLOBAI_TABLE; 


procedure ALLOCATE_HASH_TABLE(output: logical addr); 
EE EEE EEE EEES 


* 


* 
* 
* 
* 
* 
E: 


This procedure is used to allocate a hashing 
table for a new retrieve-common request from 


a predefined secondary storage area and return 


* 
* 
* 
the logical disk address to the calling + 
procedure. * 

x 


The bucket entries are also initialized. 


RR HI AI RR RR RR RR RIT ARA AR A A A A A A Y 


end procedure ALLOCATE HASH TABLE; 
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procedure CHECRSGEOBADETAGDE(imnput: GTSptr, recuest id; 


outputs logical addr, NEW REQUEST); 


Y RRR RRR KR RR ROK RK RR KK RK RR RACK k k k k k kok kk 


* 


* + + dt + 3% & 


This procedure is used to check whether a request * 


is a new request by checking its request id 
against the global table. If the request id is 
found in the global takle, then set the value of 
NEW REQUEST to false and return the logical disk 
address of the hashing table to the calling 
procedure.  Ctherwise, return the NEW REQUEST 


rack to the calling procedure. 


* 


* XK + + # a 


AC ko RR AR O oc eoi ooo o oe Ne KK KK KK KK KK RR e Y 


end procedure CHECK_GIOBAL TABLE; 


procedure INSERT GLOEAL TABLE(input: GT ptr, Req id, 


leogicalsaddr; 


output: GT_ptr); 


PY RARA A A e co RK HH KK KKK KKK KK KK KH KK KK KK KK KK RK 


* 


*% + + + + $ + + 3€ %* & 


+ 


This procedure is used to insert a new hashing 
table into tne global table. 


Data structures and variabies used in this 
procedure are: 
oG ptr: 
A pointer to the global table. 
Ze Reg_id: 
The request id of the records of the new 
hashing table. 
3. logical addr: 
The logical disk address of the new hashing 
table. 


% $ Y + Y Y * $ € + Y 2 


X 


X * 
* An inverted list implementation to maintain the * 
* table is reccimanded. = 
eR RR oo oe ake ke ee ok ok kc oko RE ROERO / 


end procedure INSERT_GLOBAL_ TABLE; 


procedure ZTORE RECORI Gl HASH TABLE 
(input: ET, bucket_rumber, 
info, NEW RECORD): 
SAAS RAC ACEA AO dk kkk kk doko dk kkk ORA tok tok 
* This procedure is used to store the common * 
attribute value of a record and the record itself 
into a hashing table. 


Recall that the records are stored in blocks. 


Data structures and the variables used in this 


* + + + + * 


procelure are: 
1. HUE 


+ 


+ 


A variatle of type hash_table which is 


+ 


% 
* 

Xx 
* 
* 
* 
* 
* 
defined in  hashing module.def (see Appendix * 
G). * 
2. bucket number: x 
The bucket number where the record * 
characterized by the common attribute value * 
is hashed into. * 
info: * 
A character string which contains the x 
common attribute value of a record and the * 
record itself. + 
ü. NEW RECORD: * 
x 


A boolean variable to indicate whether the 


+ % + + + + dB Y + + 3 
C2 
« 
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input info is a new record of this request 
id: 

5. old bucket number: 
The bucket number of the previous input 
Decomde 

OE DRE 
A variable of type BUCKET ENTRY which is 
defined in hashing module.def (see Appendix 
Gye 

73 BIKT pPtE: 
A pointer to a record block of type 
REC_BLOCK which is defined in 
hashing module.def (see Appendix G). 

8- Dlk; bik 2: 
Variables of type REC_BLOCK which is defined 
hashing module.def (see appendix G). 

SENT 
An integer variable. 

10. MAX BLCCK, SIZE: 
An integer that represent the maximum 


length cf the block content. 
xk oce oco dO exco detecto xexeteteteotexetoketetedeketokeoketeteeetetetetekeotetexelotek / 


j£ + % Y Y * + + * 4“ + % % + H+ Y € % %*Ù K 
+ + + + + i+ iF % Y Y % H Y * Y % Ye + KF X 


if NEW_RECORD 
tren 
/* This record is the first input record of this */ 
/* request. vv 
perform GET THE BUCKET(HT, bucket number, bkt); 
perrocm ALLCCATE REC BLOCK (DIK); 
perform MODIFY_ENTRY_&_HEADER (bkt, bik); 
else 
/* Compare the input bucket number with the 
previous cne.  */ 
if bucket number «€» ola bucket number 
then 


19-1 


perform STORE, BACK (blk); 
/* Get the desired bucket entry for this 
input record. */ 
bkt - HT.bkt entries[ bucket number ); 
/* Check if the tucket is empty. */ 
if bkt.status - empty 
then 
perform ALLOCATE REC BLOCK(blk, addr); 
perform MODIFY ENTRY £ HEADER (bkt, 
blk,addr); 
else 
/* Get the record block by the address */ 
/* in the bucket entry.*/ 
perform GET REC BLOCK (bkt.block, address, 
blk); 
end if: 
end Ir. 
/* Check if the block has enough space to */ 
/* store this record. "*7 
I - STRING IENGTH(info); 
if (blk.header.length * I) » MAX BLK SIZE 
then 
/* This block does not have enough space */ 
/* tor this record. */ 
perform ALLOCATE RECORD BLOCK(blk 2, 
addr 2); 
perforr MODIFY ENTRY £ HEADER(brt, 
DEKE 2, 
addr 2); 
/* This routine will also modify */ 
/* the header of blk 2. */ 
perform STORE  BACK(blk); 
Elk pp 2; 
end if; 


end if; 
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PEFECERSSIORESINPCRA (SB LOCK (Ino Mok) - 
end procedure STORE RECORD IN HASH TABLE; 


pepcegure SIORECBACKWd(anput: A structure); 
ARE RARE RR AA A e e E ARR EE e e A E e k k k k k k k k k k kk k kk 


* This procedure is used to store a hashing table, * 
* or a record blcck back to the secondary storage. * 
+ x 
* A structure is a variable which may be either * 
* a hashing table or a block. * 


AAA RN A AR RR RO RA e c eC GE eR eG Ge Ce GGG Ex 


end procedure STORE EACK; 


peeccdune GET REC BLOCK (input: logical addr; 
output blk) :; 
ARR RR RCA IRA COR OA EE EI ER E Ae Ae e AAC O RO ROIO lok 
* This procedure is used to brinj a block of memory * 


* from a predefined secondary storage area into tne * 


* primary memory by its logical address. * 
* Data structures and variables used in this * 
* procedure are: + 
* 1. logical addr * 
* The logical address of a block. * 
* A variable of addr definition which is * 
* defined in the commdata.def file. $ 
* ERO DIR * 
* A variable of type REC_BLOCK which is defined* 
* in the hashing module.def (see Appendix G). * 
e e e e i SS ok ole oko ok oc oec tele eoe tete koko kK kk k k KR RR RK RK y 


end procedure GET REC BLOCK; 
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procedure STORE INFO INSBPEOCK(umput: info, bik); 
J RRR III IG II IG GE RIG ROC ROIO AO 


> 


+ + + 4 3% 


+ + + Y + Bo Y + j%H + 4 


This procedure is used to store the common 
attribute value of a record and the record 
itself into a block. 
It is called only when the block has enough 
space for that information, i.e., info. 
Data structures and variables used in this 
procedure are: 
le INTO 
A character string which contains the 
common attribute value of a record and 
the reccrd itself. 
Dis DIK: 
A variable of type REC BLOCK which is 
defined in hashing module.def (see 
Appendix G). 
3. AR 


+ + + * * + H e + * Y Y HF H # + %* 


General purpose indexes. 


A Ae e A RA RR A cic e cc e e k k ic ac i e KK Kk k / 


i = 0; 
j = tlk.header.length+1; 
repeat 


blk.contents[j]= infofìi]; 


3 


j 


= iti; 


IRE 


until i - STRING IENGTH(info); 
end procedure STORE INFO IN. BLCCK; 
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procedure MODIFY ENTEULNSSHEADESGIBSUt:XEkt. blk, 


Dik addr; 
output: "BRE, Dl: 


J HR RRR RR IIR dak gk kk kok doko ok 


* 


* 
+ 
* 
+ 
* 
* 
* 
x 
* 
* 
* 
* 
* 
* 
x 
A 
+ 
x 
x 


This procedure is used to modify the bucket 
entry of the input bkt and the header part 
cf the input blk. It will then return these 
modified bkt and blk back to the calling 
procedure. 
Data structures and variables used in tnis 
[rocedure: 

Te bkt: 


A variable of type Bucket_entry 
which is defined in hashing_module. def 
(see Appendix G). 
Zoe eK 
A variable of type REC_BLOCK which 
is defined in hashing_module.def 
(see Appendix G). 
3. DIk_ addr 
A variaLle of type addr definition 


Jb + + # 30 Ho dt aat HF HF H H 38 do % AA AK 


which is the logical address of a block 


m 


and is defined in the conndata.def file. 


e c c cho ole coo oto BK KK KK KK KK KK KKK KKK KK KK KK KKK / 


blk. header. next_blk_addr = bkt.block address; 
bkt.~klock_address = blk_addr; 
end procedure HODIFY ENTRY & HEADER; 
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procedure BROADCAST TARGET IdFO(input: HT); 
S BRR AA AAC A RR oed ke ode ote RR RRR IO OR EK RK RK OK 


1. 


t + Y + Re Re ++ dt 4 % e H + +... >. + Y H % 04 »Ì 


This procedure is used to broadcast the records 
of the target hashing table to the other 
kackends. 

This is the same procedure that is used to 
broadcast the descriptor ids among backends. 
Data structures and variables used in this 


procedure are: 


Hue 

À variable of type hashing table 

which is defined in hashing module.ief 
(see Appendix G). 

Qu 

À general purpose index. 

MAX BKT $4: 

An integer which is used to represent the 
maximum number of the bucket entries in a 
hashing tatle. 

bkt: 

A variable of type Bucket entry which 
is defined in hashing_module.def (see 
Appendix G). 


msg: 


A character string which is used to store 


the message that is to be broadcasted to all 


of the backends. 


* 
* 
x 
x 
* 
x 
* 
x 
* 
* 
* 
x 


* + + + + + + H X + + 3% 3% 


ARA ARA RO RR a ee o ceo RR A A A A AR AR Y 


for 1 


bkt 


1 to MAX_EKT_# do 


= HT.bkt_entries[i}; 


if bkt.status <> empty 


then 


/* Put the bucket number into the message.*/ 
perform GET REC BIOCK(bkt.block address,Eblk); 


136 


repeat 
/* Extract the contents of the */ 
/* blk.content and copy tnen into msg. */ 
if the msg is full 
then 
send msg to all of the backends; 
reset the length of msg to 0; 
end if; 
if blk.next_blk address = blk.own_address 
then 
/* This block is the last block for 
this bucket. */ 
last = true; 
until last; 
end if: 
end fom; 
send the msg to all of the other backends; 
end procedure BROADCAST TARGET INFO; 


137 


proced 


APPENDIX F 
THE MERGING PROCEDURE PROGRAM SPECIFICATIONS 


ure MERGZ(input: source request id, 


logical address of source tarle, 


logical address of target table); 


JOO 3 GO doo a ok ok tok lok 


* 


ES 
x 
E 
E: 
È 
x 
& 
x 
* 
de 
* 
x 
x 
* 
x 
* 
* 
* 
* 
x 
E 
* 
de 
* 
x 
- 
x 
* 
x 
* 
* 
* 
x 
x 
x 
E 
* 
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a 
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/* 
/* 
/* 
/* 


This procedure is used to perform the merging 


operation over the source records and the target 


records. 
Notice that the input addresses are the logical 
disk addresses of the two hashing tables. 
Data structures: and variables used in this 
procedure are: 

1. logical address, of, source, table, 


logical address,of target table: 


The logical disx addresses of the source 


and the target hashing tables, both of the type 


address definition which is defined in the 
commdata. def file. 
2. source table, target table: 


Variables cf hashing table data type (see 


Appendix G) that represents the source-hashing 


table and the target-hashing table. 
3. i: A general purpose index. 


Y. max bucket_number: 


The largest bucket number of a hashing takle. 
FOC RR Rio ok gO Ok aR dk kk ak ok 


Retrieve the two hashing tables by the input 
logical addresses. 
Ncte: Due to the limited memory space, we may 


not be able to bring in the entire table. 


N Ie IE dle de He 3€ 36 3€ 3€ 36 36 36 36 3€ 36 30 36 3€ 36 3C at 36 36 30 36 3€ 36 36 3€ 3€ HH HH HHH HH 


1 
d 
“A 
SA 


perform GET HASH TABLE(logical address of source table, 
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source table) ; 
perform GET HASH TABLE(logical,address of target tarle, 
target table); 
/* Reserve a result buffer. */ 
perform GET BUFFER(result ruffer,source request id); 
/* This routine will allocate an instance of a 
result buffer and put the request id into the 
the header cf tne buffer and initialize the 
length of the buffer to 0. 
This routine has already been coded in 
the retp.c file. */ 
1 .= 05 
while i < max bucket number do 
if ( (source tarle.bucket entryji].status «5 empty) 
and 
(target tarLble.bucket entry[i].status <> empty) j 
then 
/* There is a collision. */ 
/* Retrieve the reccrds from both blocks and 


perform the merging operation. */ 


x 


y 


source table.bucket entry[i].logical address; 


target table.bucket entry[i].logical address; 
perform merging operation (X,Y,result buffer); 
/* This routine will perform the merging 
operation and send the merged results 
to the contrcller. */ 
end if; 
i = itl; 
end while; 
/* Signal PP upen the completion of the source and */ 
/* target request. */ 


end procedure MERGE; 
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procedure MERGING_OPERATION 
(input: logicl address source block, 
logicl, address, target block, 
result buffer; 
output: result buffer); 
Y aaa lalalala lalalala ajillo jala jaa lalo locos 
* This procedure is used to perform the following 
* tasks: 
+ 1. Extract the records from both of the source 
* block and the target block. 
È 2. Compare the common attribute values 
* of the source and target records. 
* If they are equal, then perform the merging 
* operation. 
* 3. Put the merged results into a result buffer. 
+ If the ruffer is full, then send the buffer 
to the controller and reinitialize the 
buffer length to 0 so that the buffer can 
be reused. 
Otherwise, return the logical address of the 


the result bufier to the calling procedure. 


Data structures and variables used in this 


+ + + +» + + + dd 


procedure are: 


a 


1. source Llock, target block: 
Variables of the data type BKT_BLK which 
are used to represent the blocks of the 
source bashing table or the target hashing 
table. 
S3KT BLK is defined in nashingy_moduie. def 
(see Appendix G}. 

2. source dcne, target done: 


Boolean variables which are used to indicate 


e+ + dB * * +* + + + # HF HH HH H 4 d€ H H KH H H H HF H H H H # 


HH + + db + % 4 4 3% 


the completion cf processing either source 
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È records cr target records. * 
* 3. 1,j: General purpose indexes. * 
X ke occi ek tok oko TOR RRR RK RO KR koko / 
/* Continue retrieving the source blocks by the */ 
/* lcgical address, until there are no more blocks. */ 
LEPear 
source block = 
GET BLOCK(logical address source block); 
/* Continue retrieving the target blocks by the */ 
/* logical address until there are no more blocks.*/ 
repeat 
target block - 
GET ELOCK(logical address, target block); 
i = 9; 
while source tlock. body|i] <> EOB do 
/* Retrieve one common attribute _ value and one */ 
/* record from source block. */ 
source_value = GET VALUE(source block.body,i ); 
Source record = GET_ RECORD (source block. body,i); 
J= 0; 
while target_block.bcāãāy{j]) <> EOB do 
/* Retrieve one common attribute value and */ 
/* one record from the target block. */ 
target value = GET_ VALUE (target block. body, J); 
target_record = 
GENFRECORD (target _block.body,)); 
if source value - target value 
then 
/* Append target record at the end of */ 
/* source record and put the newly */ 
/* merged record into the result buffer.*/ 
result - APPEND(source record, 
target record); 
result length - STRING LENGTH(result); 
perfcrm RB$PUT SEND(result buffer, 


141 


result, 
result_length); 
else 
/* Go to the next target record. */ 
J = J+1; 
end if; 
end while; /* End the target-record loop. */ 
i = I+1; 


end while; /* End the source-record loop. */ 


/* Àre the target records done? */ 
if target block.header.next block address - 
target blcck.header.tais block address 
then 
target dcne - true; 
else 
target Llock.header.next block address - 
target block. header.this, block address; 
end if; 


until targeteectre: 


/* Are the source records done? */ 
if source block.header.next block address - 
source block.header.this block, address 
then 
source done - true; 
else 
source blcck.header.next block address = 
Source biock.nheader.this block adiress; 
end if; 
until source done; 
end procedure MERGING OPERATION; 
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APPENDIX G 
TEE HASHING MODULE DATA STRUCTURE DEFINITIONS 


In this appendix we present the definitions of the data 
structures used in the previous appendices. We refer to the 


definiticns as hashing module. def. 


n osh _Euffer: 


This 1s tne buffer which stores the hashed d3formaticn 


of records. 


l EE. nesrequest Td of 
| Request i34 | 
the hashed records. 


sa? The current length 
Length 
of the Hashed results. 


=? An array of Character 


l | 
| | 
| dashed resuits | 


string used for 
EN — AS storing the hashed 
records. 
The format of tie hashed results is: 


(hashed record infcj* EOReq EOB 


where 
hashed record info :: = bucket number 507 {Rec}+t 
Rec :: = (attribute value pairj*tEZORec 


attribute value pair :: - 
attribute name EON attribute value ZOV 

"+" means one cr more occurence. 

ECB : A special character which is used as a marker 
for the end-of-buffer. 

ECV : A special character which is used as a marker 
for the end-of-value. 

FCN : A special character which is used as a marker 
for the end-of-attribute cane. 


EORec: A special character which is used as a marker 
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Za 


tor the end-of@aBccordì 

ECReg: A character, either 1 or 0, «which is 
use to indicate the end of a request. 
1: end cf a request. 


0: not end of request, more buffers are coming. 


REC, EICCK 

Blocks used by buckets to store the records and tteir 
common attribute values. 

A REC BLOCK is composed o£ a header two fields, 


and a contents. 


| --> This part contains the status 
of this block. 


--> This part contains the records 


| contents : . 
| and their common attribute values. 


The format of the content of the REC_BLOCK is: 
{Rec} +EOB 


The header contains two parts: 


--> An integer to indicate the total 


length E , ] 
| length of the records in this 
7 block. 
--> The logical address of the next 
| next blk addr _ _ 
block of the same bucket. (If 


this block is the sive@e I cck (of 
the bucket, then a null aidress 
will be put in here.) 

The type of this field is 
address definition and is 


defined in the conmdata.derz file. 
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EC Bucketwventriy: 


"m _ --> A character which is either 1 fcr 
not empty or a 0 for empty . 
canne RRES --> The logical address of the block 
of this bucket. 


4. Hash_tabie: An array of 2048 bucket_entries. 
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